KLA
Sr. Platform Engineer- GenAI
KLA, Ann Arbor, Michigan, us, 48113
Base Pay Range:
$103,000.00 - $175,100.00 Annually
Primary Location:
USA-MI-Ann Arbor-KLA
KLA’s total rewards package for employees may also include participation in performance incentive programs and eligibility for additional benefits. Interns are eligible for some of the benefits identified below. Our pay ranges are determined by role, level, and location. The range displayed above reflects the minimum and maximum pay for this position in the primary location identified in this posting. Actual pay depends on several factors, including location, job-related skills, experience, and relevant education level or training. If applicable, your recruiter can share more about the specific pay range for your preferred location during the hiring process.
Company OverviewKLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop, smartphone, wearable device, voice-controlled gadget, flexible screen, VR device or smart car would have made it into your hands without us. KLA invents systems and solutions for the manufacturing of wafers and reticles, integrated circuits, packaging, printed circuit boards and flat panel displays. The innovative ideas and devices that are advancing humanity all begin with inspiration, research and development. KLA focuses more than average on innovation and we invest 15% of sales back into R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work together with the world’s leading technology providers to accelerate the delivery of tomorrow’s electronic devices. Life here is exciting and our teams thrive on tackling really hard problems. There is never a dull moment with us.
Group/DivisionThe Information Technology (IT) group at KLA is involved in every aspect of the global business. IT’s mission is to enable business growth and productivity by connecting people, process, and technology. It focuses not only on enhancing the technology that enables our business to thrive but also on how employees use and are empowered by technology. This integrated approach to customer service, creativity and technological excellence enables employee productivity, business analytics, and process excellence.
Job Description/Preferred Qualifications
Identify and resolve infrastructure gaps to ensure reliable, efficient, and scalable solutions
Develop advanced AI/ML infrastructure solutions that enhance the efficiency of our skilled ML teams
Design and implement solutions for critical areas, including distributed storage systems, scheduling systems, high availability capabilities, and core reliability issues within our large-scale GPU clusters
Monitor and optimize the performance of our AI/ML infrastructure, ensuring high availability, scalability, and efficient resource utilization
Develop and deploy automation tools, monitoring solutions, and operational strategies to streamline infrastructure management and reduce manual tasks
Work with various teams, including ML developers, data engineers, and DevOps professionals, to create a cohesive and integrated AI/ML infrastructure ecosystem
Implement and manage GPU infrastructure within Kubernetes clusters to support high-performance computing and AI/ML tasks
Deploy and manage open-source GenAI components, such as vector databases and various AI/ML models, ensuring seamless integration and optimal performance
Evaluate and integrate new open-source GenAI tools and technologies to enhance the platform’s capabilities
Collaborate with the research and development teams to implement and optimize innovative AI/ML models and algorithms
Ensure the security and compliance of open-source GenAI components within the infrastructure
Leverage High-Performance Computing (HPC) experience to optimize and manage large-scale AI/ML workloads
Design, implement, and manage on-premises, cloud, and hybrid-based ML platforms to support diverse AI/ML workloads and ensure flexibility and scalability
Minimum Qualifications
Bachelor's Degree or equivalent training/certifications in Computer Science or related IT field
Eight (8) years of implementing and maintaining AI/ML Infrastructure On-Prem environment
Strong experience with AI/ML infrastructure and tools, including GPU clusters and Kubernetes
Proficiency in deploying and managing open-source GenAI components and vector databases
Hands-on experience with high-performance computing (HPC) environments
Expertise in designing and managing on-premises, cloud, and hybrid-based ML platforms
Solid understanding of distributed storage systems, scheduling systems, and high availability capabilities
The company offers a total rewards package that is competitive and comprehensive including but not limited to the following: medical, dental, vision, life, and other voluntary benefits, 401(K) including company matching, employee stock purchase program (ESPP), student debt assistance, tuition reimbursement program, development and career growth opportunities and programs, financial planning benefits, wellness benefits including an employee assistance program (EAP), paid time off and paid company holidays, and family care and bonding leave.
KLA is proud to be an Equal Opportunity Employer. We do not discriminate on the basis of race, religion, color, national origin, sex, gender identity, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other status protected by applicable law. We will ensure that qualified individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us at
talent.acquisition@kla.com
or at
+1-408-352-2808
to request accommodation.
#J-18808-Ljbffr
$103,000.00 - $175,100.00 Annually
Primary Location:
USA-MI-Ann Arbor-KLA
KLA’s total rewards package for employees may also include participation in performance incentive programs and eligibility for additional benefits. Interns are eligible for some of the benefits identified below. Our pay ranges are determined by role, level, and location. The range displayed above reflects the minimum and maximum pay for this position in the primary location identified in this posting. Actual pay depends on several factors, including location, job-related skills, experience, and relevant education level or training. If applicable, your recruiter can share more about the specific pay range for your preferred location during the hiring process.
Company OverviewKLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop, smartphone, wearable device, voice-controlled gadget, flexible screen, VR device or smart car would have made it into your hands without us. KLA invents systems and solutions for the manufacturing of wafers and reticles, integrated circuits, packaging, printed circuit boards and flat panel displays. The innovative ideas and devices that are advancing humanity all begin with inspiration, research and development. KLA focuses more than average on innovation and we invest 15% of sales back into R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work together with the world’s leading technology providers to accelerate the delivery of tomorrow’s electronic devices. Life here is exciting and our teams thrive on tackling really hard problems. There is never a dull moment with us.
Group/DivisionThe Information Technology (IT) group at KLA is involved in every aspect of the global business. IT’s mission is to enable business growth and productivity by connecting people, process, and technology. It focuses not only on enhancing the technology that enables our business to thrive but also on how employees use and are empowered by technology. This integrated approach to customer service, creativity and technological excellence enables employee productivity, business analytics, and process excellence.
Job Description/Preferred Qualifications
Identify and resolve infrastructure gaps to ensure reliable, efficient, and scalable solutions
Develop advanced AI/ML infrastructure solutions that enhance the efficiency of our skilled ML teams
Design and implement solutions for critical areas, including distributed storage systems, scheduling systems, high availability capabilities, and core reliability issues within our large-scale GPU clusters
Monitor and optimize the performance of our AI/ML infrastructure, ensuring high availability, scalability, and efficient resource utilization
Develop and deploy automation tools, monitoring solutions, and operational strategies to streamline infrastructure management and reduce manual tasks
Work with various teams, including ML developers, data engineers, and DevOps professionals, to create a cohesive and integrated AI/ML infrastructure ecosystem
Implement and manage GPU infrastructure within Kubernetes clusters to support high-performance computing and AI/ML tasks
Deploy and manage open-source GenAI components, such as vector databases and various AI/ML models, ensuring seamless integration and optimal performance
Evaluate and integrate new open-source GenAI tools and technologies to enhance the platform’s capabilities
Collaborate with the research and development teams to implement and optimize innovative AI/ML models and algorithms
Ensure the security and compliance of open-source GenAI components within the infrastructure
Leverage High-Performance Computing (HPC) experience to optimize and manage large-scale AI/ML workloads
Design, implement, and manage on-premises, cloud, and hybrid-based ML platforms to support diverse AI/ML workloads and ensure flexibility and scalability
Minimum Qualifications
Bachelor's Degree or equivalent training/certifications in Computer Science or related IT field
Eight (8) years of implementing and maintaining AI/ML Infrastructure On-Prem environment
Strong experience with AI/ML infrastructure and tools, including GPU clusters and Kubernetes
Proficiency in deploying and managing open-source GenAI components and vector databases
Hands-on experience with high-performance computing (HPC) environments
Expertise in designing and managing on-premises, cloud, and hybrid-based ML platforms
Solid understanding of distributed storage systems, scheduling systems, and high availability capabilities
The company offers a total rewards package that is competitive and comprehensive including but not limited to the following: medical, dental, vision, life, and other voluntary benefits, 401(K) including company matching, employee stock purchase program (ESPP), student debt assistance, tuition reimbursement program, development and career growth opportunities and programs, financial planning benefits, wellness benefits including an employee assistance program (EAP), paid time off and paid company holidays, and family care and bonding leave.
KLA is proud to be an Equal Opportunity Employer. We do not discriminate on the basis of race, religion, color, national origin, sex, gender identity, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other status protected by applicable law. We will ensure that qualified individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us at
talent.acquisition@kla.com
or at
+1-408-352-2808
to request accommodation.
#J-18808-Ljbffr