Logo
Pfizer

Sr Associate Machine Learning Data Engineering (Evergreen)

Pfizer, Cambridge, Massachusetts, us, 02140


Pfizer Worldwide Research, Development, and Medical (WRDM) is expanding its work in applying Machine Learning (ML) and Artificial Intelligence (AI) technologies to Biomedical Research. This will enable Pfizer to enhance our drug discovery efforts, sustain our industry leading R&D productivity, and deliver breakthrough medicines to the patients most in need. The centerpiece of this initiative is the establishment of a "ML Research Hub," a new group charged with mastering state-of-the-art machine learning techniques to create novel predictive models and tools used across WRDM. Data Engineering will be key to the success of this new group given the enormous scope of the chemical, biological, omics, and clinical data available at Pfizer. The WRDM Research Hub is seeking experienced data engineers with a background in machine learning, software engineering, technical problem-solving skills, and experience in creating scalable data pipelines and infrastructure for training, validating, and deploying into production ML solutions for broad usage.Role Responsibilities

The successful candidate will work with ML research scientists to enable our proprietary data and external datasets to be leveraged for ML modeling. This will be accomplished by implementing end-to-end data workflows for large-scale data ingestion, processing, tagging, and publishing, with an eye towards improving ML model performance over time.Basic Qualifications

Bachelor’s degree in computer science, Statistics, Applied Mathematics, Chemistry, Physics, a life science discipline, or related technical discipline.Training or work experience in Python, Java, Scala, C++, or SQL.Training or work experience in software design, development, and algorithm-related solutions for production-grade systems using machine learning.Knowledge of one or more scientific data types (e.g., biomedical images, biomedical text, large-scale, multidimensional 'omics, large- or small- molecule therapeutics, clinical or Real-World Data, etc.)Preferred Qualifications

MS or 2 years of relevant research experienceFamiliarity with high performance computing (HPC) environments (SLURM/LSF/SGE schedulers)Familiarity with cloud computing infrastructure including Amazon Web Services (AWS) and distributed computing libraries (e.g., Spark, Hive, Impala, Kafka, etc.)Understanding of containerization and orchestration tools (e.g., Docker, Singularity, Airflow, Luigi, Kubernetes, etc.)Basic knowledge of CI/CD and automation tools (Terraform, CloudFormation, Jenkins, Ansible, etc.)Passion and curiosity for data and proven ability to take ideas from prototype to production.Technologies We Use

Python, Java, C++, Slurm-based on-premises compute clusters, Google Cloud Platform, AWS, Docker, Singularity, Kubernetes, Python (Numpy, Pandas, Dask, PyTorch, TensorFlow, sci-kit learn, RDKit, Weights and Biases etc.

#J-18808-Ljbffr