JobRialto
Data Engineer - Clinical Decision Support Solutions
JobRialto, West Sacramento, California, us, 95798
Job Summary
Client is seeking a skilled Data Engineer to join the Microbiology R&D Development Science functional team. This role involves developing and maintaining end-to-end data and machine learning (ML) pipelines to support clinical and verification studies. The position will focus on ML model deployment throughout its lifecycle, ensuring efficient data handling and processing to support data science and analytics projects. The Data Engineer will work in a hybrid role, based in Sacramento, California, and report to the Data Science Manager. This is an excellent opportunity for those passionate about data engineering and contributing to world-class biotechnology.
Key Responsibilities Collaborate with Stakeholders: Work closely with stakeholders to understand data requirements for machine learning (ML), data science, and analytics projects. Data Management and Processing: Assemble, clean, and harmonize large, complex datasets from diverse sources. Write code, scripts, and queries to efficiently extract, visualize, and process big data sets. Pipeline Development: Develop and write pipelines for optimal extraction, transformation, and loading (ETL) of data using Python, SQL, Spark, and AWS 'big data' technologies. Schema Design: Design data schemas to support the Data Science team's development needs. Continuous Improvement: Identify and implement process improvements, such as automating manual processes and optimizing data delivery methods. ML Inference Pipeline Maintenance: Design, develop, and maintain a dedicated ML inference pipeline on the AWS platform (e.g., SageMaker, EC2). Deployment and Monitoring: Deploy inference models on EC2 instances or Amazon SageMaker and establish a data pipeline to store and track inference output, model performance, and KPI benchmarks. Documentation and Training: Document data processes, write recommended data management procedures, and create training materials on data management best practices. Required Qualifications
Education: BS or MS in Computer Science, Computer Engineering, or equivalent experience. Experience:
5-7 years of experience in data engineering and MLOps, developing and deploying data and ML pipelines. 5+ years of experience deploying ML models via AWS SageMaker, AWS Bedrock. 5+ years of programming and scripting experience using Python, SQL, and Spark. Deep knowledge of AWS core services such as RDS, S3, API Gateway, EC2/ECS, Lambda, etc. Hands-on experience with model monitoring, drift detection, and automated retraining processes. Experience with CI/CD pipeline implementation using tools such as GitHub (Workflows and Actions), Docker, Kubernetes, Jenkins, and Blue Ocean. Experience working in an Agile/Scrum-based software development structure. 5+ years of experience with data visualization and/or API development for data science users. Preferred Qualifications
Skills & Knowledge:
Advanced expertise in AWS cloud platforms and services. Knowledge of big data technologies and best practices for data pipeline development and deployment. Experience with Agile/Scrum methodologies and collaborative team environments. Physical Requirements
Ability to work in a hybrid office environment, collaborating with both local and remote teams.
Location
-
West Sacramento 1 1584 Enterprise Blvd. West Sacramento CA 95691 United States
Education:
Bachelors Degree
Client is seeking a skilled Data Engineer to join the Microbiology R&D Development Science functional team. This role involves developing and maintaining end-to-end data and machine learning (ML) pipelines to support clinical and verification studies. The position will focus on ML model deployment throughout its lifecycle, ensuring efficient data handling and processing to support data science and analytics projects. The Data Engineer will work in a hybrid role, based in Sacramento, California, and report to the Data Science Manager. This is an excellent opportunity for those passionate about data engineering and contributing to world-class biotechnology.
Key Responsibilities Collaborate with Stakeholders: Work closely with stakeholders to understand data requirements for machine learning (ML), data science, and analytics projects. Data Management and Processing: Assemble, clean, and harmonize large, complex datasets from diverse sources. Write code, scripts, and queries to efficiently extract, visualize, and process big data sets. Pipeline Development: Develop and write pipelines for optimal extraction, transformation, and loading (ETL) of data using Python, SQL, Spark, and AWS 'big data' technologies. Schema Design: Design data schemas to support the Data Science team's development needs. Continuous Improvement: Identify and implement process improvements, such as automating manual processes and optimizing data delivery methods. ML Inference Pipeline Maintenance: Design, develop, and maintain a dedicated ML inference pipeline on the AWS platform (e.g., SageMaker, EC2). Deployment and Monitoring: Deploy inference models on EC2 instances or Amazon SageMaker and establish a data pipeline to store and track inference output, model performance, and KPI benchmarks. Documentation and Training: Document data processes, write recommended data management procedures, and create training materials on data management best practices. Required Qualifications
Education: BS or MS in Computer Science, Computer Engineering, or equivalent experience. Experience:
5-7 years of experience in data engineering and MLOps, developing and deploying data and ML pipelines. 5+ years of experience deploying ML models via AWS SageMaker, AWS Bedrock. 5+ years of programming and scripting experience using Python, SQL, and Spark. Deep knowledge of AWS core services such as RDS, S3, API Gateway, EC2/ECS, Lambda, etc. Hands-on experience with model monitoring, drift detection, and automated retraining processes. Experience with CI/CD pipeline implementation using tools such as GitHub (Workflows and Actions), Docker, Kubernetes, Jenkins, and Blue Ocean. Experience working in an Agile/Scrum-based software development structure. 5+ years of experience with data visualization and/or API development for data science users. Preferred Qualifications
Skills & Knowledge:
Advanced expertise in AWS cloud platforms and services. Knowledge of big data technologies and best practices for data pipeline development and deployment. Experience with Agile/Scrum methodologies and collaborative team environments. Physical Requirements
Ability to work in a hybrid office environment, collaborating with both local and remote teams.
Location
-
West Sacramento 1 1584 Enterprise Blvd. West Sacramento CA 95691 United States
Education:
Bachelors Degree