Logo
ZipRecruiter

Lead Machine Learning Infrastructure Engineer

ZipRecruiter, San Francisco, CA, United States


Company Overview: Welcome to the forefront of AI-driven innovation! Our company is a trailblazer in leveraging machine learning to revolutionize industries. We're committed to building robust infrastructure that powers our machine learning models at scale. Join us and lead our efforts in shaping the future of AI infrastructure engineering.

Position Overview: As the Lead Machine Learning Infrastructure Engineer, you'll play a pivotal role in leading our machine learning infrastructure initiatives and driving the design, development, and optimization of our infrastructure solutions. You'll lead a team of skilled engineers, collaborating closely with cross-functional teams to deliver high-quality, scalable, and reliable infrastructure that supports our machine learning workflows. If you're a seasoned engineer with expertise in machine learning infrastructure technologies and a proven track record of leadership in delivering successful projects, we invite you to lead our team in this exciting opportunity.

Key Responsibilities:

  1. Technical Leadership: Provide strategic guidance, mentorship, and technical leadership to a team of machine learning infrastructure engineers, fostering a culture of excellence, innovation, and collaboration.
  2. Infrastructure Design: Lead the design and architecture of scalable and reliable infrastructure solutions to support machine learning workflows, including data ingestion, model training, evaluation, and deployment.
  3. Data Pipeline Development: Lead the development and optimization of data pipelines to ingest, preprocess, and transform data for training machine learning models, ensuring data quality, integrity, and scalability.
  4. Model Training Infrastructure: Design and optimize infrastructure for training machine learning models at scale, leveraging distributed computing frameworks and accelerators for performance and efficiency.
  5. Model Deployment: Lead the design and implementation of systems for deploying and managing machine learning models in production environments, ensuring reliability, scalability, and real-time inference capabilities.
  6. Monitoring and Logging: Implement robust monitoring and logging solutions to track the performance and health of machine learning infrastructure and models, proactively identifying and resolving issues.
  7. Automation and Orchestration: Develop automation and orchestration tools to streamline machine learning workflows, reducing manual intervention and improving operational efficiency.
  8. Security and Compliance: Implement security controls and ensure compliance with data privacy regulations in machine learning infrastructure and workflows, protecting sensitive data and ensuring regulatory compliance.
  9. Documentation and Best Practices: Define and promote best practices for machine learning infrastructure engineering, ensuring clear and comprehensive documentation to facilitate understanding and collaboration among team members.
  10. Collaboration: Collaborate closely with data scientists, machine learning engineers, and software developers to understand requirements and deliver infrastructure solutions that meet business needs.
  11. Mentorship and Development: Mentor and coach junior engineers, providing guidance, support, and opportunities for skill development and career growth, and foster a culture of continuous learning and improvement within the team.

Qualifications:

  • Bachelor's degree or higher in Computer Science, Engineering, Mathematics, or related field.
  • 8+ years of experience in infrastructure engineering, with a focus on machine learning infrastructure.
  • Proven leadership experience, with a track record of successfully leading machine learning infrastructure teams and delivering complex projects.
  • Expertise in cloud platforms such as AWS, Azure, or Google Cloud Platform, and services like AWS SageMaker, Azure Machine Learning, or Google AI Platform.
  • Strong programming skills in such as Python, Java, or Scala, with experience in distributed computing frameworks like Apache Spark or TensorFlow.
  • Experience with containerization technologies such as Docker and container orchestration platforms such as Kubernetes.
  • Strong understanding of machine learning concepts and techniques, with experience deploying and managing machine learning models in production environments.
  • Strong problem-solving skills and analytical thinking, with the ability to design and troubleshoot complex infrastructure issues.
  • Excellent communication and collaboration skills, with the ability to work effectively in cross-functional teams and communicate technical concepts to non-technical stakeholders.

Benefits:

  • Competitive salary: The industry standard salary for Lead Machine Learning Infrastructure Engineers typically ranges from $200,000 to $300,000 per year, depending on experience and qualifications.
  • Comprehensive benefits package, including health insurance, retirement plans, and wellness programs.
  • Flexible work arrangements, including remote work options and flexible hours.
  • Generous vacation and paid time off.
  • Professional development opportunities, including access to training programs, conferences, and workshops.
  • State-of-the-art technology environment with access to cutting-edge tools and resources.
  • Vibrant and inclusive company culture with opportunities for growth and advancement.
  • Exciting projects with real-world impact at the forefront of AI-driven innovation.

Join Us: Ready to lead the charge in machine learning infrastructure engineering? Apply now to join our team and be part of the AI revolution!

#J-18808-Ljbffr