Logo
Accenture

Full Stack MLOps Engineer

Accenture, Columbia, Maryland, United States, 21046


Technology Architecture Associate Manager

| Mid-Level | Full timeWe AreNextira, now part of Accenture, builds cloud-based solutions and services with cutting-edge engineering skills, artificial intelligence (AI), machine learning (ML), and data analytics that enable clients to design, build, launch and optimize high-performance computing environments.You AreWe are seeking a highly motivated and technically skilled MLOps Engineer to join our team in supporting our large-scale GPU-based AI training and research cluster hosted on AWS. The ideal candidate should have a strong foundation in Linux and a solid understanding of operating systems and be able to communicate effectively with highly technical users on a wide range of technical topics.The WorkYou (MLOps Engineer) will participate in the design, development, and operational management of cloud-native computing clusters to perform ML training and inference. You will assist ML engineering teams to optimize performance and troubleshoot issues with their training workloads.You will manage HPC (High Performance Computing) clusters, including schedulers such as Slurm, and compute nodes accelerated with NVIDIA GPUs. You’ll help users configure and manage their Conda environments, optimizing them for their specific AI training and research needs. As an MLOps Engineer, you will design, deploy, and maintain cloud infrastructure with infrastructure-as-code (IaC) tools such as Terraform and AWS CDK.Our MLOps Engineers engage in clear and effective communication with highly technical users, providing support and guidance on a wide range of technical topics related to the cluster. While utilizing their strong Linux skills to troubleshoot and resolve issues, optimize system performance, and ensure a stable and reliable environment for AI training and research.Travel may be required for this role. The amount of travel will vary from 0 to 100% depending on business need and client requirements.

#J-18808-Ljbffr