Logo
McKinsey & Company

Senior Cloud Infrastructure Engineer

McKinsey & Company, Atlanta, Georgia, United States, 30383


McKinsey and Company Inc. US is seeking a Senior Cloud Infrastructure Engineer in Atlanta, GA.Responsibilities:Design and implement CI/CD pipelines that automate the provision of infrastructure related to container orchestration systems like EKS by deploying infrastructure-as-a-code templates using Terraform, GitHub Actions, and CircleCI.Manage lifecycle including creation, operation, upgradation, and deletion for different types of Kubernetes clusters including EKS in AWS, Kubeadm, and On-Prem using Terraform, GitHub Actions, and AWS Service Catalog.Collaborate with product teams in developing, testing, and deploying software that automates the provisioning infrastructure for Kubernetes related resources using programming languages such as Golang or Python.Design and implement GitOps workflow that automatically deploys all applications to Kubernetes clusters using tools such as ArgoCD, GitHub, and Helm Charts.Create automated workflows that provision computing infrastructure for AI/ML related workloads in AWS using GitHub Actions and NVIDIA's GPU operator.Develop Observability framework and runbook automation for various infrastructure components related to Kubernetes workloads and AWS resources.Define and maintain performance metrics such as SLIs, SLOs for platform infrastructure coordinating with other stakeholders using tools such as Dynatrace.Develop the instrumentation framework for platform infrastructure including logging and monitoring using tools such as Dynatrace, Splunk, and Prometheus.Design and implement solutions that eliminate toil for secrets management, rotation, and sync across different platforms such as Vault, AWS Secrets Manager, and Kubernetes clusters using Golang, ArgoCD, and GitHub Actions.Develop alerting framework for platform infrastructure using tools such as Dynatrace and Victor-Ops to create alerts and dashboards that reduce incidents and outages.Maintain security compliance by automating the patching cycle for cloud infrastructure using Bash scripts, Python, and others as per the standards set in tools such as Wiz.Solve production availability incidents for platform infrastructure that often spans across multiple teams, document incident root cause analysis, and create solutions to prevent re-occurrence.Provide cost-optimizing solutions for cloud infrastructure using tools such as Cloudability and create automation scripts using programming languages such as Bash scripts, Python, or Golang.Collaborate with SRE and product teams to work on sprints and project tasks in an agile manner.Qualifications:Qualified applicants for this position must possess a Master's degree or foreign degree equivalent and three (3) years of experience in Kubernetes administration and lifecycle management; OR a Bachelor's degree or foreign degree equivalent and five (5) years of experience in Kubernetes administration and lifecycle management. Experience must include:IaC languages: Terraform or Cloud Formation;Proficiency in programming languages: Golang or Python;Knowledge of AWS cloud infrastructure;CI/CD tooling and version control: CircleCI, GitHub Actions, and GitHub;GitOps deployment workflow and tools such as Flux or ArgoCD;Using at least one of the following monitoring, security alerting, and data analytics tools: Aqua, Splunk, Prometheus, Grafana, NewRelic, or Dynatrace;Networking knowledge: load balancing, network security, and standard network protocols;Participation in product lifecycle process, including contributing to or creating product and technical requirements at the enterprise level and preparing detailed design documents including security, transaction, capacity and bandwidth models, systems definition, and operational procedures;Familiarity with agile concepts and working experience in agile team culture.Email your resume to

CO@mckinsey.com

and refer to Job # 8070709.

#J-18808-Ljbffr