Logo
RunPod

Machine Learning Engineer

RunPod, San Francisco, California, United States, 94199


Position:

ML Engineer - Full Time - RemoteReports to:

Head of DataSalary Range:Company Overview:RunPod is a fast-growing start-up that empowers developer teams to deploy custom, full-stack AI apps simply and at scale. We seek a talented and experienced ML Engineer to join our dynamic team.Job Summary:As an ML Engineer, you will be responsible for building the next generation, highly available, global GPU cloud computing service with open-source technologies to enable and accelerate RunPod’s rapid growth.This system spans many diverse environments (containerization, VMs and bare metal compute) and provides a cohesive and reliable abstraction for running AI workloads in them. You will get to be a technology thought leader, evangelize new, cutting-edge technologies, and solve complex problems. To be successful you have experience practicing infrastructure-as-code. You have strong software development fundamentals and skills. In addition, you have strong systems knowledge and troubleshooting abilities.Requirements:2+ years experience writing high-performance, well-tested, production quality code2+ years of software development experience and proficiency in pythonExcellent understanding of low level operating systems concepts including multi-threading, memory management, networking and storage, performance, and scaleExperience working on applied ML/AI products in productionKnowledge of distributed systems and HPCExperience with Tensorflow and JAX is a plusPragmatic, methodical, well-organized, detail-oriented, and self-startingExperience with containerization, VPNs, AI workloads a plusGPU programming, NCCL, CUDA knowledge a plusExperience in at least one backend programming language a plusFamiliarity with open source inference and training stacks like vLLM, TGI, TensorRT, Torchrun, etc. a plusDemonstrated experience with high performance or distributed cloud microservices architectures and ideally experience building them in operation at a global scale a plusResponsibilities:Perform architecture and research work for AI workloadsWork on the core, RunPod AI platformCreate services, tools, and developer documentationCreate testing frameworks for robustness and fault-toleranceCompensation Package:RunPod's compensation package comprises three elements: salary, equity, and benefits. We are committed to pay fairness and aim for these three elements to be highly competitive with market rates. On top of this position's salary, equity will be a component of total compensation. The exact amount will be communicated at the time of offer issuance.Join Us:At RunPod, you’ll have the opportunity to work on cutting-edge technology and significantly impact the AI and ML fields. We encourage you to apply if you’re driven by innovation excellence and want to be part of a team that values bold ideas and professional growth. Let's shape the future of technology together!Non-Discrimination in Hiring Practices:RunPod is committed to maintaining a workplace free from discrimination and upholding the principles of equality and respect for all individuals. Our hiring practices are designed to ensure fairness, objectivity, and inclusiveness, adhering to all applicable laws and regulations regarding nondiscrimination.

#J-18808-Ljbffr