Arrayo

Data Engineering & Pipelining Lead

Arrayo, Boston, Massachusetts, us, 02298

We are excited to be expanding our Life Science Data Analytics Platform group in our Boston and Cambridge offices. We are looking for a Data Science & Pipelining Lead to join our Life Science Data Analytics Platform Group.

Arrayo assists top tier clients in life sciences in implementing effective Data Analytics strategies. We make sure that data assets are available and accessible for advanced analytics, and so that the inherent value of data assets can be realized more readily.

The Data Science & Pipelining Lead will be responsible for driving the socialization and utilization of technologies, algorithms, models, and methods for science driven data analytics R&D projects. As an Arrayo Team member, you will work to understand users’ requirements, and drive the definition, design, implementation and validation of cutting-edge pipelines and models used to process and analyze diverse sources of data.

Responsibilities:

Develop data pipelines to extract, transform, and load data from various data sources in various forms.

Work in collaboration with key scientific personnel to build, test, adapt, support, and validate pipelines with integration into production systems.

Manage the definition, design, implementation, and validation of data pipelines and models to analyze data from diverse sources.

Write custom scripts to extract data from unstructured/semi-structured sources.

Make great use of advanced pipeline technologies incl. Prefect, Nextflow, Airflow, Cromwell, KNIME, Databricks, Luigi, petl, AWS Data Pipeline.

Leverage big-data technologies for data processing, including Apache Spark, Kubernetes, Apache Pulsar, AWS (Lambda, S3, Athena).

Deliver solutions in an efficient agile manner.

Contribute to many different projects in a dynamic, fast-moving environment.

Collaboratively translate scientific and business questions into data and analytics requirements.

Drive rapid prototyping for further implementation of analytical products.

Partner with SMEs to translate modeling outputs into business language.

Work with IT resources to enable appropriate data flow/data models.

Requirements:

B.S. in information systems, computer science, computer engineering or related field with 4+ years of experience working within bioinformatics, genomics, genetics, or other science related environments. M.S. or PhD Is preferred.

Knowledge of a subset of analytical approaches (ex. machine learning, statistical analysis, predictive modeling, visual analytics).

Proficiency building, running, and monitoring pipelines on cloud computing environments.

Experience in commonly used command-line NGS tools is a plus (BWA, SAMTools, Bowtie2, Picard, PINDEL, GATK, etc.).

Ability to understand and communicate statistical measures for interrogating the quality of data manipulation preferred.

Demonstrated ability to communicate efficiently and work effectively with a team of scientists and other engineers.

Experience with SQL and modeling relational databases. PostgreSQL experience preferred.

Experience using / designing web services and REST APIs.

Knowledge of software development best practices

Experience working in Cloud Computing environments (ex AWS, Azure, etc) is preferred.

#J-18808-Ljbffr