Ascentt
Lead DevOps/MLOps Engineer
Ascentt, Dallas, Texas, United States
About the Role We are seeking a highly skilled MLOps Engineer to join our team and lead the development, deployment, and management of machine learning pipelines in production. The ideal candidate will bridge the gap between data science, machine learning, and infrastructure engineering, ensuring scalable, secure, and efficient workflows. We are looking for a strong engineer with expertise in GitHub Actions, Docker, and TypeScript who can build robust and scalable automation pipelines. Responsibilities MLOps Pipeline Development: Design and implement CI/CD pipelines for ML model deployment across development (dev), QA, and production environments using tools like GitHub Actions. Automate workflows for seamless model deployment and monitoring, including serverless endpoints. Model Serving and Infrastructure: Build scalable and secure model-serving solutions using platforms like AWS SageMaker. Implement cross-domain logging, monitoring, and governance to track model performance. Cloud Engineering and Cost Optimization: Leverage cloud platforms (e.g., AWS SageMaker) for efficient infrastructure management. Work closely with cloud engineers to optimize costs and ensure resource scalability. Collaboration and Best Practices: Collaborate with data science teams to manage experiment tracking (e.g., using Weights and Biases) and model registration workflows. Create reusable scaffolding templates for MLOps pipelines, including linting, pre-commit hooks, and secret checks. Write efficient and scalable scripts and modules in TypeScript for automation and workflow improvements. Support and Troubleshooting: Provide expertise for multi-domain MLOps platforms, ensuring smooth operations for diverse data science teams. Address and resolve challenges during deployment and operation, including handling change requests. Qualifications Technical Skills: Proficiency with GitHub Actions, Docker, and TypeScript. Strong experience with AWS SageMaker or similar platforms. Hands-on expertise in CI/CD tools, automation, and orchestration. Knowledge of cloud cost management and strategies to optimize spending. Familiarity with experiment tracking tools like Weights and Biases. Hands-on experience with containerization tools (Docker) and infrastructure as code tools like Terraform. Problem-Solving Skills: Ability to develop strategies for scalable multi-domain MLOps platforms. Experience in managing competing priorities and delivering efficient solutions. Experience: Demonstrated expertise in MLOps, cloud engineering, and infrastructure management. Familiarity with team collaboration tools, processes, and frameworks for scalable machine learning deployments. Preferred Familiarity with SageMaker multi-domain setups and deployment workflows. Experience in managing hybrid teams spanning data science and cloud engineering. Knowledge of cost-saving techniques in cloud computing environments. Soft Skills Excellent communication skills to collaborate with cross-functional teams. Strong problem-solving and troubleshooting abilities. Ability to manage tasks and projects independently while maintaining alignment with team goals.