GlaxoSmithKline

Data Engineer II

GlaxoSmithKline, Seattle, Washington, us, 98127

Site Name: London The Stanley Building, San Francisco, USA - Washington - Seattle Posted Date: Feb 6 2024 The Onyx Research Data Platform organization represents a major investment by GSK R&D and Digital & Tech, designed to deliver a step- change in our ability to leverage data, knowledge, and prediction to find new medicines. We are a full-stack shop consisting of product and portfolio leadership, data engineering, infrastructure ,and DevOps, data / metadata / knowledge platforms, and AI/ML and analysis platforms, all geared toward: Building a next-generation data experience for GSK's scientists, engineers, and decision-makers, increasing productivity and reducing time spent on "data mechanics" Providing best-in-class AI/ML and data analysis environments to accelerate our predictive capabilities and attract top-tier talent Aggressively engineering our data at scale to unlock the value of our combined data assets and predictions in real-time Data Engineering is responsible for the design, delivery, support, and maintenance of industrialized automated end-to-end data services and pipelines. They apply standardized data models and mapping to ensure data is accessible for end users in end-to-end user tools through the use of APIs. They define and embed best practices and ensure compliance with Quality Management practices and alignment to automated data governance. They also acquire and process internal and external, structured and unstructured data in line with Product requirements. A Data Engineer II is a technical contributor who can take a well-defined specification for a function, pipeline, service, or other sort of component, devise a technical solution, and deliver it at a high level. They have a strong focus on the operability of their tools and services, and develop, measure, and monitor key metrics for their work to seek opportunities to improve those metrics. They are aware of, and adhere to, best practices for software development in general (and data engineering in particular), including code quality, documentation, DevOps practices, and testing. They ensure the robustness of our services and serve as an escalation point in the operation of existing services, pipelines, and workflows. A Data Engineer II should be deeply familiar with the most common tools (languages, libraries, etc) in the data space, such as Spark, Kafka, Storm, etc., and aware of the open-source communities that revolve around these tools. They should be constantly seeking feedback and guidance to further develop their technical skills and expertise and should take feedback well from all sources in the name of development. Key responsibilities for the Senior Data Engineer include: Builds modular code / libraries / services / etc using modern data engineering tools (Python/Spark, Kafka, Storm, ...) and orchestration tools (e.g. Google Workflow, Airflow Composer) Produces well-engineered software, including appropriate automated test suites and technical documentation Develop, measure, and monitor key metrics for all tools and services and consistently seek to iterate on and improve them Ensure consistent application of platform abstractions to ensure quality and consistency with respect to logging and lineage Fully versed in coding best practices and ways of working, and participates in code reviews and partnering to improve the team's standards Adhere to QMS framework and CI/CD best practices Provide L3 support to existing tools / pipelines / services Why you? Basic Qualifications: We are looking for professionals with these required skills to achieve our goals: 4+ years of data engineering experience with a Bachelors degree. 2+ years of data engineering experience with a PhD or a Masters degree. Cloud experience (e.g., AWS, Google Cloud, Azure, Kubernetes) Experience in automated te