Broad Institute
Senior Machine Learning Engineer I
Broad Institute, Cambridge, Massachusetts, us, 02140
The Senior Machine Learning Engineer will participate in research and development efforts aimed at solving problems in analyzing large-scale clinical data with a mission of improving human health. The candidate will work with multiple modalities, including imaging data, time series, and clinical notes. The candidate will also develop methods to ascertain disease outcomes as well as characterize risk factors from large electronic health record data sets. Rich representations of clinical data derived from deep learning models will be used in conjunction with genetic data to investigate the genetic basis for disease. The ideal candidate has both a theoretical and practical understanding of deep learning techniques and has a proven track record in areas such as clinical research, computational biology, probability, statistics, or data science.The candidate joins a strong team of machine learning practitioners to work with, has access to vast amounts of clinical data, and is encouraged to publish new methods and results in academic journals and conferences. The candidate will conduct research in clinical ML and disease biology, and must collaborate effectively with researchers at the Broad Institute and beyond. This position is suited to a person who is excited by the prospect of learning, adapting and applying modern machine learning techniques to solve the key challenges for emerging clinical data modalities, with revolutionary implications in advancing the state-of-the-art clinical practice.Responsibilities
Adapting and applying existing machine learning techniques to clinical datasetsDeveloping novel machine learning methods for understanding and organizing unstructured datasetsDeveloping robust and generalizable inference algorithms that advance the state-of-the-artWriting well-crafted, maintainable, scalable, and performant machine learning codeDesigning, developing, and maintaining testing frameworks for machine learning codeDeveloping techniques for characterizing, processing, and storing large real world clinical datasetsRequirements
Master's degree in Computational Biology, Computer Science, Physics, Math, Statistics, or related quantitative fields, or relevant experience5-7 years designing and training models on large, complex and/or biased datasets.5-7 years experience across deep learning frameworks like Keras, TensorFlow or PyTorch and machine learning packages (sklearn, etc.)Fluent with data modeling, indexing, and ETL and cloud-based pipelines (expertise with MLFlow or other MLOps packages, BigQuery, Spark or equivalent)Strong bash/shell scripting and proficiency with UNIX operating systemsFamiliarity with Numpy and PandasStrong communication skills and ability to collaborate with clinicians, data scientists, and software engineers on model requirements and design.Preferred Skills
Knowledge of software engineering best practices including version control and writing testsKnowledge of MLOps best practicesExperience developing data pipelines to prepare data for modeling from large, messy data setsExperience working with clinical data or omics data
#J-18808-Ljbffr
Adapting and applying existing machine learning techniques to clinical datasetsDeveloping novel machine learning methods for understanding and organizing unstructured datasetsDeveloping robust and generalizable inference algorithms that advance the state-of-the-artWriting well-crafted, maintainable, scalable, and performant machine learning codeDesigning, developing, and maintaining testing frameworks for machine learning codeDeveloping techniques for characterizing, processing, and storing large real world clinical datasetsRequirements
Master's degree in Computational Biology, Computer Science, Physics, Math, Statistics, or related quantitative fields, or relevant experience5-7 years designing and training models on large, complex and/or biased datasets.5-7 years experience across deep learning frameworks like Keras, TensorFlow or PyTorch and machine learning packages (sklearn, etc.)Fluent with data modeling, indexing, and ETL and cloud-based pipelines (expertise with MLFlow or other MLOps packages, BigQuery, Spark or equivalent)Strong bash/shell scripting and proficiency with UNIX operating systemsFamiliarity with Numpy and PandasStrong communication skills and ability to collaborate with clinicians, data scientists, and software engineers on model requirements and design.Preferred Skills
Knowledge of software engineering best practices including version control and writing testsKnowledge of MLOps best practicesExperience developing data pipelines to prepare data for modeling from large, messy data setsExperience working with clinical data or omics data
#J-18808-Ljbffr