Lead Machine Learning Engineer
Clearpoint, Houston, TX, United States
TITLE: Lead Machine Learning Engineer
LOCATION: Houston, Texas
TYPE: Direct Hire
SALARY: $195,000 - $230,000
SUMMARY:
The Lead Machine Learning Engineer will be responsible for establishing DevOps and MLOps processes throughout the Corporate Data & Analytics Team to support AI/ML applications. Driving the adoption of best practices in DevOps and MLOps, resulting in faster deployment of AI/ML and data-driven solutions that satisfy business requirements. This role requires extensive experience in DevOps and MLOps, a thorough grasp of InfraOps, and a comprehensive understanding of AI/ML data and analytics cloud services and components. You will work closely with data scientists, machine learning engineers, data engineers, software engineers, and platform architects, using cutting-edge tools and technologies to deploy and maintain AI/ML and advanced analytics solutions, as well as integrate analytic models with existing business applications.
DUTIES:
- Enhance current DevOps processes to improve the whole AI/ML application development lifecycle.
- Collaborate with development and cloud platform teams to verify that the infrastructure satisfies the application's needs.
- Establish and maintain the best practices for cloud security, compliance, and cost efficiency.
- Create automated build and deployment methods to allow continuous delivery of software releases, as well as improve the existing CI/CD pipelines for AIML application development and deployment.
- Collaborate with data scientists, data engineers, data analysts, software engineers, IT professionals, and stakeholders to accelerate the deployment of AI applications using CI/CD pipelines while maintaining the applications SLAs on a single platform.
- Design, develop, and maintain infrastructure using infrastructure as code tools such as Terraform, Ansible, CloudFormation, etc.
- Utilize existing Databricks CLI codes to manage the Databricks platform as code for AI/ML data pipelines (batch processing, batch streaming, and streaming) and model serving endpoints.
- 10+ years of experience in software engineering with a strong background in DevOps and Infrastructure as Code, supporting Machine Learning and Data Science workloads.
- Expertise in code versioning tools, such as Gitlab, GitHub, Azure DevOps, and Bitbucket, familiar with branch-level code repository management.
- Experience deploying Machine Learning solutions on cloud platforms (e.g., AWS, Azure, or GCP), Databricks and AWS is preferred.
- Proficient with GitHub actions to automate testing and deployment of data and ML workloads from CI/CD provider to Databricks.
- Strong knowledge of infrastructure automation tools such as Terraform, Ansible, CloudFormation, etc.
- Experience with data processing frameworks/tools/platforms such as Databricks, Apache Spark, Kafka, Flink, and AWS cloud services for batch processing, batch streaming, and streaming.
- Experience containerizing analytical models using Docker and Kubernetes or other container orchestration platforms.
- Technical expertise across all deployment models on public cloud, private cloud, and on-premises infrastructure.
- Experience in event-driven, and microservice architectures for enterprise-level platform development.
- Expertise in Linux, and knowledge of networking and security concepts.
- Bachelor's Degree in Computer Science, Computer Engineering, Information Technology, Software Engineering, or equivalent technical discipline