Highbrow LLC

DevOps Engineer with AI Ops Experience

Highbrow LLC, Atlanta, Georgia, United States, 30383

Job Title :-

DevOps Engineer with AI Ops Experience

Employment Type :- W2 Duration :- Long Term Visa Type :- All Visa applicable which are ready for W2 Location- Atlanta, GA (Day-1 Onsite) Exp required :-

5+ years of experience in DevOps engineering, with at least 3 years specializing in AI Ops or supporting ML/AI model deployment and infrastructure. Job Description:

5+ years of experience in DevOps engineering, with at least 3 years specializing in AI Ops or supporting ML/AI model deployment and infrastructure. Proven experience in designing, implementing, and managing CI/CD pipelines and ML Ops frameworks to automate AI/ML workflows. Technical Skills:

Proficiency in cloud platforms (AWS, GCP, Azure) with hands-on experience in deploying AI/ML models and utilizing AI/ML services (e.g., AWS SageMaker, Google AI Platform). Strong skills in containerization and orchestration tools such as Docker and Kubernetes, especially for deploying machine learning models at scale. Experience with infrastructure-as-code tools like Terraform, CloudFormation, or Ansible to manage and provision cloud and on-premise environments. Proficiency in CI/CD tools (e.g., Jenkins, GitLab CI, CircleCI) to build automated pipelines for AI/ML model training, testing, and deployment. Solid understanding of monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack) for model performance tracking and infrastructure observability. Strong programming and scripting skills in Python, Bash, and YAML for automating workflows and integrating services. AI Ops and MLOps Skills:

Experience with MLOps best practices, including model versioning, automated retraining, and model governance for reliable and reproducible AI pipelines. Hands-on experience with model monitoring tools (e.g., MLflow, Kubeflow, or TFX) to track model performance, drift, and retraining needs. Familiarity with data pipelines and orchestration tools (e.g., Apache Airflow, Prefect) for managing data and model workflows. Knowledge of model deployment strategies (e.g., blue-green deployments, canary releases) to ensure reliable AI/ML model deployment with minimal downtime. Experience with A/B testing and experiment tracking to evaluate model performance in production and measure the impact on business KPIs. DevOps and Automation Skills:

Ability to design and manage scalable infrastructure to support machine learning workloads, ensuring cost efficiency, performance, and security. Proficiency in automating testing and deployment processes for data and model pipelines to support fast, reliable releases. Familiarity with serverless architectures and cloud-native tools for AI, allowing for flexible and efficient resource management. Experience with security best practices, including role-based access control, data encryption, and compliance requirements for data-sensitive applications. Communication and Collaboration Skills:

Excellent communication skills with the ability to collaborate closely with data scientists, ML engineers, and software development teams. Proven ability to document infrastructure, CI/CD pipelines, and MLOps processes, ensuring transparency and knowledge sharing across teams. Strong problem-solving skills and a proactive approach to troubleshooting, particularly in managing and resolving deployment and performance issues. Ability to train and mentor team members on MLOps tools, best practices, and model deployment techniques. Additional Qualifications:

Experience with data security and governance standards, especially related to machine learning applications in regulated industries. Familiarity with AI ethics and compliance, including model fairness, transparency, and risk management. Knowledge of advanced monitoring and alerting tools and techniques to ensure the reliability of AI systems in production. Strong interest in staying up-to-date on the latest advancements in MLOps and AI Ops to continuously improve infrastructure and processes.

#J-18808-Ljbffr