Logo
ZipRecruiter

Lead ML Operations Engineer (MLOps)

ZipRecruiter, San Francisco, CA, United States


Job Description

Company Overview: Welcome to the forefront of machine learning operations (MLOps)! Our company is dedicated to leveraging the power of machine learning to drive innovation and transform industries. We're committed to developing cutting-edge ML solutions that deliver real-world impact and value to our customers. Join us and lead our team in shaping the future of MLOps.

Position Overview: As the Lead ML Operations Engineer, you'll be responsible for leading our MLOps efforts and driving the design, implementation, and optimization of infrastructure and processes for deploying, monitoring, and managing machine learning models at scale. You'll lead a team of talented engineers and collaborate closely with data scientists, software engineers, and DevOps teams to streamline the machine learning lifecycle and ensure reliable and efficient model operations. If you're a seasoned engineer with a passion for machine learning and a track record of designing and implementing MLOps solutions, we want you on our team.

Key Responsibilities:

  1. Technical Leadership: Lead and mentor a team of ML Operations Engineers, providing guidance, direction, and support in driving MLOps innovation and execution.
  2. Infrastructure Design: Design and implement scalable and reliable infrastructure for deploying and serving machine learning models, leveraging cloud platforms and containerization technologies.
  3. Model Deployment: Develop automated pipelines for deploying machine learning models into production environments, ensuring consistency, reliability, and reproducibility.
  4. Monitoring and Alerting: Implement monitoring and alerting systems to track model performance, data drift, and other metrics, enabling proactive detection and mitigation of issues.
  5. Model Versioning and Management: Establish version control and management processes for machine learning models, enabling easy tracking, rollback, and experimentation.
  6. Continuous Integration/Continuous Deployment (CI/CD): Implement CI/CD pipelines for automating model training, testing, and deployment, reducing time to market and improving agility.
  7. Scalability and Efficiency: Optimize the performance and scalability of machine learning infrastructure, leveraging techniques such as distributed computing, parallelization, and resource management.
  8. Security and Compliance: Ensure machine learning systems comply with security and privacy standards, implementing access controls, encryption, and other security measures as needed.
  9. Documentation and Best Practices: Document MLOps processes, best practices, and standards, providing guidance and training to data scientists and engineers.
  10. Collaboration: Collaborate with cross-functional teams, including data scientists, software engineers, and DevOps teams, to streamline the machine learning lifecycle and drive continuous improvement.
  11. Research and Innovation: Stay informed about the latest advancements in MLOps tools and technologies, exploring innovative approaches and techniques to enhance machine learning operations.

Qualifications:

  • Bachelor's degree or higher in Computer Science, Engineering, Mathematics, or related field.
  • 7+ years of experience in software engineering, DevOps, or related roles, with a focus on building and maintaining infrastructure for machine learning operations.
  • Leadership experience, with a demonstrated ability to lead and mentor a team of engineers.
  • Strong understanding of machine learning concepts and techniques, with experience working with data science teams and machine learning models.
  • Proficiency in programming such as Python, Java, or Scala, and experience with cloud platforms such as AWS, Azure, or Google Cloud Platform.
  • Experience with containerization technologies such as Docker and orchestration tools such as Kubernetes.
  • Familiarity with machine learning frameworks and libraries such as TensorFlow, PyTorch, scikit-learn, or MLflow.
  • Experience with CI/CD pipelines, version control systems, and automation tools such as Jenkins, GitLab, or CircleCI.
  • Strong problem-solving skills and analytical thinking, with the ability to troubleshoot complex issues and optimize system performance.
  • Excellent communication and collaboration skills, with the ability to work effectively in cross-functional teams and communicate technical concepts to non-technical stakeholders.

Benefits:

  • Competitive salary: The industry standard salary for Lead ML Operations Engineers typically ranges from $150,000 to $250,000 per year, depending on experience and qualifications.
  • Comprehensive health, dental, and vision insurance plans.
  • Flexible work hours and remote work options.
  • Generous vacation and paid time off.
  • Professional development opportunities, including access to training programs, conferences, and workshops.
  • State-of-the-art technology environment with access to cutting-edge tools and resources.
  • Vibrant and inclusive company culture with opportunities for growth and advancement.
  • Exciting projects with real-world impact at the forefront of MLOps innovation.

Join Us: Ready to lead the charge in MLOps innovation? Apply now to join our team and shape the future of machine learning operations!

#J-18808-Ljbffr