Principal Machine Learning Infrastructure Engineer
Acceler8 Talent, San Francisco, CA, United States
Introduction:
Join a pioneering team at the forefront of AI and ML technology, where human-computer collaboration is not just a concept but a reality. Our team is dedicated to revolutionizing user experiences by innovating at every level, from user interfaces down to the most efficient models. This is more than a job; it's a journey into the future of technology.
About the Company:
Our company thrives on the belief that a small, focused group of talented individuals can generate significant breakthroughs in the AI field. We're a multi-disciplinary team driven to solve complex, real-world AI challenges. Backed by industry giants and venture capital powerhouses, we're well-positioned to reshape the landscape of AI and ML technologies.
About the Role:
As a Staff ML Infrastructure Engineer, you'll work closely with researchers and product engineers, creating magical product experiences powered by large language models. You'll be at the helm of designing and implementing scalable ML systems, working across high-performance computing clusters. Your expertise will transform the infrastructure for training and serving, pushing the boundaries of AI technology.
What We Offer You:
- A role where your contributions have a direct impact on groundbreaking AI advancements.
- A collaborative, innovative environment that fosters growth and learning.
- Competitive compensation and benefits, including relocation assistance for those moving to San Francisco.
- Access to cutting-edge technology and resources.
Key Responsibilities:
- Collaborate in the development of large language models, using state-of-the-art frameworks.
- Innovate in performance tuning for training and inference workloads in AI models.
- Develop and optimize training and serving infrastructure, including custom kernel writing.
- Implement parallelism methods for efficient, large-scale training of AI models.
Relevant Keywords: Large Language Models, Machine Learning, High-Performance Computing, Distributed Systems, Scalability, AI Accelerators, Quantization, Kernel Languages, Cloud Services, Containerization, Network Fundamentals.