Converge Resources

Lead AI/ML Operations Engineer

Converge Resources, Houston, Texas, United States, 77246

Job Title: Lead Machine Learning Ops Engineer Location: Houston, TX Employment Type: Full-Time Company Overview: Our client, a leading innovator in the AI/ML space, is seeking a highly skilled Lead Machine Learning Ops (MLOps) Engineer to join their dynamic team in Houston. This position plays a critical role in ensuring the seamless integration of DevOps and MLOps practices to support the development and deployment of AI/ML applications. If you're passionate about cutting-edge technologies and looking to make a significant impact in AI/ML operations, this is the perfect opportunity for you. Job Overview: As the Lead MLOps Engineer, you will be responsible for driving the adoption and implementation of DevOps and MLOps practices to streamline AI/ML application enablement. Your expertise will ensure efficient model development, deployment, and monitoring processes, as well as the integration of cloud services that support AI/ML and analytics. This role requires a deep understanding of infrastructure operations (Infra Ops) and hands-on experience with AI/ML cloud services and data analytics tools. Key Responsibilities: Lead the design, implementation, and maintenance of MLOps pipelines to support the development and deployment of AI/ML models. Collaborate with data scientists, AI/ML engineers, and DevOps teams to ensure scalable, reliable, and efficient AI/ML infrastructure. Implement and manage cloud-based data and analytics services for AI/ML workloads. Integrate MLOps practices into the organization's DevOps processes, focusing on automation, version control, and continuous deployment. Oversee model monitoring, retraining, and deployment workflows to ensure high-performing AI/ML applications. Optimize infrastructure for high availability, scalability, and performance of AI/ML applications. Provide technical leadership and mentorship to junior engineers on best practices in MLOps and AI/ML infrastructure management. Required Qualifications: Bachelor's or Master's degree in Computer Science, Engineering, or a related field. Strong background in DevOps and MLOps practices with 5 years of experience. Deep understanding of infrastructure operations (Infra Ops) in cloud-based environments. Proficiency with AI/ML cloud services such as AWS SageMaker, Google AI Platform, or Azure Machine Learning. Experience with CI/CD pipelines, containerization (Docker/Kubernetes), and version control (Git). Familiarity with AI/ML model lifecycle management, including data pipeline development, model training, and model serving. Hands-on experience with data analytics tools and AI/ML frameworks such as TensorFlow, PyTorch, or Scikit-learn. Preferred Skills: Strong scripting skills (Python, Bash) for automation and infrastructure management. Experience with infrastructure-as-code (IaC) tools like Terraform, Ansible, or CloudFormation. Knowledge of distributed systems and scalable architectures. Familiarity with monitoring and logging tools (Prometheus, Grafana, etc.) for AI/ML model performance tracking. Experience working in Agile environments and collaborating across cross-functional teams. Benefits: Competitive salary and performance-based bonuses. Comprehensive health, dental, and vision insurance. Opportunities for career advancement and professional development. Collaborative and innovative work environment.