Genesis Therapeutics Inc.
Principal Infrastructure Engineer Lead
Genesis Therapeutics Inc., Burlingame, California, United States, 94012
Genesis Therapeutics is building a world-class computational team to solve problems in drug discovery through machine learning, biophysical simulation, and computational chemistry. We are looking for a Principal Infrastructure Engineer that is excited to help develop new medicines and play a critical role in building out our AI platform.
You Will
Lead our infrastructure team to maintain and grow our multi-cloud compute infrastructure that supports our ML model training, computational chemistry research, and ongoing drug discovery efforts.
Build out our configuration and procedures for monitoring, resource allocation, and deployment automation, as we continue to grow our autoscaling compute clusters to handle larger workloads.
Work on improvements to our internal job scheduling system to increase our execution throughput, reliability, and compute utilization across heterogeneous pipelines.
You Are
5+ years of experience building and maintaining cloud infrastructure at scale, e.g. within AWS or GCP.
Proficient with Python, Bash, Terraform, and Kubernetes.
Ideally, experience building and maintaining compute clusters running distributed ML training jobs with 1,000+ GPUs.
Nice to have: hands-on experience with physical hardware + datacenter management.
What We Offer
The opportunity to work on high impact infrastructure that is used to accelerate the discovery of new medicines.
A world-class, tight-knit team of good-hearted people across software, machine learning, computational chemistry, medicinal chemistry, and biology.
Competitive salary and equity. Medical, dental, and vision insurance, and a 401(k) program.
#J-18808-Ljbffr
You Will
Lead our infrastructure team to maintain and grow our multi-cloud compute infrastructure that supports our ML model training, computational chemistry research, and ongoing drug discovery efforts.
Build out our configuration and procedures for monitoring, resource allocation, and deployment automation, as we continue to grow our autoscaling compute clusters to handle larger workloads.
Work on improvements to our internal job scheduling system to increase our execution throughput, reliability, and compute utilization across heterogeneous pipelines.
You Are
5+ years of experience building and maintaining cloud infrastructure at scale, e.g. within AWS or GCP.
Proficient with Python, Bash, Terraform, and Kubernetes.
Ideally, experience building and maintaining compute clusters running distributed ML training jobs with 1,000+ GPUs.
Nice to have: hands-on experience with physical hardware + datacenter management.
What We Offer
The opportunity to work on high impact infrastructure that is used to accelerate the discovery of new medicines.
A world-class, tight-knit team of good-hearted people across software, machine learning, computational chemistry, medicinal chemistry, and biology.
Competitive salary and equity. Medical, dental, and vision insurance, and a 401(k) program.
#J-18808-Ljbffr