Planet Pharma
Data Scientist
Planet Pharma, Cambridge, Massachusetts, United States
Location: Cambridge, MA - Onsite 3 days / week Salary: $110,000 - $150,000 Type: direct hire with a biotechnology company Title Data Scientist, Computational Biology Role Overview Innovative biotech is seeking a talented Data Scientist to enable the discovery of new insights from our extensive and growing DECODE immune synapse database. The successful candidate will work at the interface of data analytics, data mining, statistics, bioinformatics, and machine learning with broad impact across early discovery, candidate development, and biomarker discovery. In addition, this role is responsible for analyzing large, multi-dimensional datasets and developing methods to identify and visualize signal in noise. The selected candidate will be a part of the Computational Analytics Team, working alongside Computational Engineering, and interfacing directly with experimentalists in Platform Discovery, Immunology, and Protein Sciences to scope, build, and implement computational solutions. The ability to work in a fast-paced, highly collaborative environment will be critical to success, as well as the ability to communicate effectively across various teams. The Data Scientist will take ownership of challenging projects and approach problems systematically to achieve robust solutions. This person is a team player who contributes positively to group and company culture. Key Responsibilities Use statistical techniques to find relationships in complex biological data. Develop, evaluate, and implement robust analytical methods/models/workflows/apps as needed for in-house discovery and development. Use analytical methods to identify patterns, signals, and features in highly multiplexed experimental assay data. Assist in the conception, development, optimization, and assessment of machine-learning models. Maintain familiarity with scientific literature to assist in the development and benchmarking of new methods. Build and deploy visualizations and user interfaces to be used by wet lab scientists. Support various teams for the processing and interpretation of next-generation sequencing (NGS) data and ensure timely delivery of results. Maintain high-quality documentation of work and discoveries, creating written reports, electronic lab notebooks, technical presentations for internal or external audiences, internal database records, code comments, and software documentation. Communicate key data insights to various audiences within R&D, as well as continuous project status updates, setbacks, and modification of strategy. Manage and execute multiple projects in across matrixed teams, working with leadership to meet short timelines while maintaining scientific rigor. Seek out external resource and expertise when required. Required Qualifications Master’s degree in Data Science, Bioinformatics, Computational Biology, Machine Learning, Statistics, Mathematics, Physics, and 3 years of professional experience. PhD preferred. Extensive experience working with multi-dimensional datasets. Extensive experience with Python analysis modules including pandas, numpy, scipy. Experience performing principal component analysis, multi-variate regressions, ANOVA, Bayesian statistics and/or other statistical methods in a biological field to identify relevant parameters and/or outcomes. Preferred Qualifications Experience processing and/or building pipelines for next-generation sequencing data including gene expression, whole exome, TCR, single-cell. Familiarity with machine-learning model development, optimization, and assessment. Familiarity with the development of deep generative models (e.g., autoregressive models, VAEs, CNNs, GANs, etc.). Demonstrated expertise in core coding environments including Python, R, SQL, bash scripts. Experience working in cloud computing environments. Experience with AI frameworks like TensorFlow, Keras, PyTorch, or sklearn. Experience with python Streamlit or R Shiny apps Experience with data visualization packages like matplotlib, seaborn, plotly, altair, ggplot2.