Logo
Understanding Recruitment Inc

Software Engineer - Supercomputing Platform & Infrastructure

Understanding Recruitment Inc, San Francisco, California, United States, 94199


Software Engineer - Supercomputing Platform & InfrastructureIntroduction:

Are you a software engineer with a passion for building resilient and optimized solutions for AI workloads? We are seeking a Software Engineer for our Supercomputing Platform & Infrastructure team to work on massive computing clusters. This role can be based in San Francisco or remote.About the Company:

We are a forward-thinking organization committed to advancing humanity’s progress by developing safe AGI. Our mission focuses on automating research and code generation, leveraging frontier-scale pre-training, domain-specific RL, ultra-long context, and test-time compute. We aim to enhance model reliability and alignment beyond human capabilities.About the Role:

As a Software Engineer on our Supercomputing Platform & Infrastructure team, you will be integral in designing and building highly available and secure AI training and inference infrastructure. Your work will ensure the reliability and optimization of GPU workloads, troubleshoot complex issues, and enhance the efficiency of our engineering processes.What We Can Offer You:Significant equity as part of total compensation401(k) plan with 6% salary matchingComprehensive health, dental, and vision insurance for you and your dependentsUnlimited paid time offFlexible work options: in-person in San Francisco or remoteVisa sponsorship and relocation stipendKey Responsibilities:Build and maintain a software stack for large-scale (thousands of GPUs) AI training and inference infrastructureTroubleshoot and resolve issues across GPU resources, networking, OS, drivers, and cloud environmentsAutomate detection and recovery processes to ensure high availability and securityInvestigate and resolve incidents affecting security and availabilityDevelop solutions to enhance engineering efficiency and speedProactively support the research and engineering teamsKeywords:

In this role as a

Software Engineer , you will utilize

networking technologies ,

cloud platforms

like

GCP, AWS, Azure , and apply your

IaC knowledge

with tools such as

Terraform

or

Pulumi . Your expertise will ensure the reliability and optimization of our

AI workloads

and

GPU deployments .

#J-18808-Ljbffr