Frederick National Laboratory for Cancer Research

Director, Bioinformatics, CGR Cancer Genomics Research Lab

Frederick National Laboratory for Cancer Research, Rockville, Maryland

Director, Bioinformatics, CGR Cancer Genomics Research Lab Job ID: req4165 Employee Type: exempt full-time Division: Clinical Research Program Facility: Rockville: 9615 MedCtrDr Location: 9615 Medical Center Drive, Rockville, MD 20850 USA The Frederick National Laboratory is a Federally Funded Research and Development Center (FFRDC) sponsored by the National Cancer Institute (NCI) and operated by Leidos Biomedical Research, Inc. The lab addresses some of the most urgent and intractable problems in the biomedical sciences in cancer and AIDS, drug development and first-in-human clinical trials, applications of nanotechnology in medicine, and rapid response to emerging threats of infectious diseases. Accountability, Compassion, Collaboration, Dedication, Integrity and Versatility; it's the FNL way. PROGRAM DESCRIPTION We are seeking an enthusiastic, resourceful, and seasoned bioinformatics professional to join our leadership team and direct the exceptional Bioinformatics Group at the Cancer Genomics Research Laboratory (CGR). Located at the National Cancer Institute (NCI) Shady Grove campus in Rockville, MD, and operated by Leidos Biomedical Research, Inc., CGR collaborates with the NCI’s Division of Cancer Epidemiology and Genetics (DCEG)—the world’s leading cancer epidemiology research group. Our scientific team leverages cutting-edge technologies to investigate genetic, epigenetic, transcriptomic, proteomic, and molecular factors that drive cancer susceptibility and outcomes. We are deeply committed to the mission of discovering the causes of cancer and advancing new prevention strategies through our contributions to DCEG’s pioneering research involving 70 investigators and their global collaborators. Our diverse team of 20 bioinformaticians, engineers and data scientists support DCEG’s multidisciplinary family- and population-based studies by working closely with epidemiologists, biostatisticians, and basic research scientists in DCEG’s intramural research program. We provide exceptional end-to-end bioinformatics support for genome-wide association studies (GWAS), methylation, targeted, whole-exome, whole-transcriptome and whole-genome sequencing along with viral and metagenomic studies from both short- and long-read sequencing platforms. This includes the analysis of germline and somatic variants, structural variations, copy number variations, gene and isoform expression, base modifications, viral and bacterial genomics, and more. Additionally, we advance cancer research by integrating the latest technologies such as single-cell, multiomics, spatial transcriptomics, and proteomics, in collaboration with the Functional and Molecular and Digital Pathology Laboratory groups within CGR. We extensively analyze large population databases such as All of Us, UK BioBank, gnoMAD and 1000 genomes to inform and validate GWAS signals, study the association between genetic variation and gene expression and develop polygenic risk scores across diverse populations. Our bioinformatics team develops and implements sophisticated, cloud-enabled pipelines and data analysis methodologies, blending traditional bioinformatics and statistical approaches with cutting-edge techniques like machine learning, deep learning, and generative AI models. We prioritize reproducibility through the use of containerization, workflow management tools, thorough benchmarking, and detailed workflow documentation. Our infrastructure and data management team works closely with researchers and bioinformaticians to maintain and optimize a high-performance computing (HPC) cluster, provision cloud environments, and curate and share large datasets. The successful candidate will oversee day-to-day bioinformatics activities, ensuring the delivery of high-quality, well-annotated and interpretable datasets to our DCEG collaborators. He/she must demonstrate scientific and technical leadership for the group, horizon scanning for future approaches, and the responsibility for establishing, maintaining, and monitoring the key performance indicators of the team. If you have experience managing a diverse group of interdisciplinary scientists, then come lead our talented team of bioinformaticians dedicated to understanding the genetics of cancer KEY ROLES/RESPONSIBILITIES Function as a scientific thought leader within the CGR bioinformatics group and the CGR leadership team, and manage collaborative analytical efforts with CGR laboratory and DCEG investigators with an eye towards scientific productivity and reproducibility Provide technical expertise within a highly collaborative environment, fostering a culture of engagement, innovation, and continuous improvement Collaborate closely with DCEG PIs on scientific manuscript development, submission, and revision activities with significant co-authorship and potentially first authorship opportunities Research and guide the development, execution, and continual improvement of robust, tested pipelines adhering to the latest standards for a wide variety of computational genomics applications, with an emphasis on scalability, portability, and thorough documentation Apply computational biology methods to integrate, analyze and visualize results from datasets obtained from various genotyping platforms including whole-genome, methylation and targeted genotyping data using Illumina Infinium and EPIC arrays, qPCR and other applications Utilize strong bioinformatics expertise and data analysis skills to process, integrate and interpret data obtained from diverse sequencing platforms including whole-genome, exome, transcriptome, methylome, microbiome, and targeted sequencing, generated by Illumina, Pacific Biosciences, Ion Torrent, Oxford Nanopore and other platforms. Collaborate with pathologists and investigators, utilize statistical, machine-learning and deep-learning methods and to integrate, analyze and visualize results with data sets obtained from a wide variety of single-cell and spatial platforms including data generated by multiple technologies, including NanoString GeoMx, 10x Chromium single-cell and Visium spatial, Ultivue and others Manage the dissemination of data to DCEG investigators and other collaborators, including adequate technical and scientific annotation required for interpretation. Support posting of data sets to public data repositories (dbGaP/SRA/GDC) as required based on NIH genomic data sharing policies and in concert with existing ontologies and FAIR principles Develop and lead strategic approaches for IT infrastructure support, including management of on-prem resources, archiving strategies for all data streams with CGR and NCI staff utilizing FAIR data principles, implementation of appropriate cloud resources, software install support and data migration support Coordinate with CGR laboratory leadership to assess changes in laboratory processes, including benchmarking protocol improvements and new instrumentation and applications. Manage and coordinate with additional resources throughout DCEG, including fellows, post-docs, other contract resources to support an integrative, collaborative, collegial environment BASIC QUALIFICATIONS To be considered for this position, you must minimally meet the knowledge, skills, and abilities listed below: Possession of a Bachelors’ degree from an accredited college/university according to the Council for Higher Education Accreditation (CHEA) in computer science, software engineering, bioinformatics, statistics, or related field required or four (4) years relevant experience in lieu of degree. Masters or PhD strongly preferred.Foreign degrees must be evaluated for U.S equivalency. In addition to educational requirements, a minimum of ten (10) years of progressively responsible scientific software engineering and/or complex system management/bioinformatics experience, including eight (8) years of experience in a leadership/manager capacity. Team-oriented with excellent written and verbal communication skills, organizational skills, and attention to detail; ability to organize and execute multiple projects in parallel both independently and as part of working groups, interact effectively with cross-functional teams, including laboratory, LIMS, project management, data scientists, epidemiologists and biostatisticians. Proven ability to establish and maintain management best practices including documentation of work, development of and reporting on key performance indicators, cost containment strategies. Demonstrated experience in successfully managing bioinformaticians and serving as a mentor for other bioinformatics and/or software developer staff members on analysis approaches and best practices. Experience managing complex pipelines, including support of best-practice full lifecycle software development experience and source code management (GitHub, GitLab, unit testing) is required Proven experience and ability to benchmark tools and compare performance to existing pipelines. Familiarity with workflow management systems and environment/dependency management tools. Experience managing large computational tasks in a Linux-based high-performance computing environment. Proven ability and demonstrated strategies to stay current with new bioinformatics analysis tools and software technologies. Experience with commonly used scripting languages in Bioinformatics such as Bash, Python, R and Perl and integrated development environments such as Eclipse and Visual Studio. Ability to obtain and maintain a security clearance. PREFERRED QUALIFICATIONS Candidates with these desired skills will be given preferential consideration: Familiarity with tools for primary analysis of the genotyping arrays (GenomeStudio, VerifyIDIntensity, PLINK, GRAF, KING, Michigan Imputation Server) and DNA/RNA sequencing data such as BWA, Bowtie2, STAR, featureCounts, Cell Ranger, Space Ranger, Loupe Browser, Harmony, fastp, fastQC, FastQ Screen, GATK Suite, bedtools, Samtools, VerifyBamID, fastNGSadmix, Somalier, Kraken2/Bracken etc.) Thorough understanding of secondary analysis tools for germline and somatic variant calling, normalization, joint genotyping and annotation tools such as DeepVariant, HaplotypeCaller, Strelka2, Mutect2, VarScan, LoFreq, BCFtools, GLnexus, ANNOVAR, InterVar, ClinVar, SnpEff, SnpSift, VEP, etc. Hands-on experience with secondary analysis of bulk, single-cell and multiomics datasets using DESeq2, Seurat, SingleR, Signac and other R/Python libraries. Familiarity with spatial transcriptomics analysis tools. Tertiary analysis approaches for variant filtering, classification, burden and association testing with tools including REGENIE and SAIGE. Experience with traditional statistical approaches for genotype-phenotype integration, machine learning algorithms to predict disease outcomes and exposure to the applications of deep learning in the single-cell, spatial transcriptomic and image analysis domains. Experience with digital pathology image analysis including familiarity with commercial and open-source tools (e.g. HALO, Visiopharm, QuPath) for analysis of whole slide and TMA images. Understanding of long read sequencing (e.g. PacBio, Oxford Nanopore) based analysis (short tandem repeat analysis, assembly, large mutation detection, phasing), microbiome analysis tools (e.g. QIIME II, DADA2, MetaPhlAn) and viral genome characterization. Eagerness to learn about emerging sequencing platforms. Experience in utilizing parallel processing techniques with CPUs, GPUs and FPGAs, efficient resource management and time optimization on high performance clusters and cloud environments. Knowledge of schedulers such as Slurm and SGE for large-scale job scheduling and management Ability to execute large genomics projects on Google Cloud or AWS by setting up the managed cloud environments, obtaining security clearance, containerizing the software and choosing the appropriate genomic applications and storage options. Strong knowledge of DevOps tools designed for project management, documentation, reproducibility and collaboration such as GitHub, Jira, Docker/Singularity/conda environments and workflow management tools such as Snakemake, WDL or Nextflow. Hands-on experience with commercial and open-source bioinformatics and genomics analysis software (IPA, Partek Flow, Nvidia Parabricks, Sentieon, Dragen). Exposure to datasets from large biobanks such as All of Us and UK Biobank and their applications in cancer research. Ability to incorporate multiple data sources for use in downstream analysis, including microarray genotype data, sequencing data, epidemiological data, publicly available data sources (e.g. TCGA, ENCODE, 1000 Genomes, COSMIC, TopMed, gnomAD, ESP) and diverse genomic annotations. Experience with prioritization in a high-pressure environment, coaching and mentoring of the staff with a focus on skill development and productivity. Commitment to Diversity All qualified applicants will receive consideration for employment without regard to sex, race, ethnicity, age, national origin, citizenship, religion, physical or mental disability, medical condition, genetic information, pregnancy, family structure, marital status, ancestry, domestic partner status, sexual orientation, gender identity or expression, veteran or military status, or any other basis prohibited by law. Leidos will also consider for employment qualified applicants with criminal histories consistent with relevant laws. 194,100.00 - 333,625.00 The posted pay range for this job is a general guideline and not a guarantee of compensation or salary. Additional factors considered in extending an offer include, but are not limited to, responsibilities of the job, education, experience, knowledge, skills, and abilities as well as internal equity, and alignment with market data. readytowork The salary range posted is a full-time equivalent salary and will vary depending on scheduled hours for part time positions