Confidential
Director of Operations
Confidential, Boston, Massachusetts, us, 02298
Director of Cloud Infrastructure & OperationsAre you a hands-on technical leader with a passion for cloud operations and expertise in the latest cloud technologies? Do you excel in fast-paced environments where uptime and efficiency are crucial? Are you someone who not only understands but actively works with cutting-edge advancements in automation, AI, and cloud optimization? If you enjoy rolling up your sleeves and staying current with emerging tools and trends in cloud operations, we invite you to join us as our Director of Cloud Infrastructure & Operations.The Opportunity:As Director of Cloud Infrastructure & Operations, you will lead the performance, scalability, and security of our multi-cloud infrastructure, reporting directly to the CTO. You will apply your deep technical skills to optimize resources, automate processes, and drive innovation with AI-driven solutions. This is a chance to collaborate with engineering and product teams, streamline operations, reduce costs, and ensure our infrastructure is future-ready while leading a high-performing cloud team.This is a unique opportunity to lead cloud operations while playing a pivotal role in enabling the company's future AI-driven growth.Responsibilities:Team Leadership:
Lead and mentor a team of Site Reliability Engineers (SREs) responsible for Infrastructure as Code (IaC), CI/CD pipelines, and cloud provisioning across AWS, Azure, and GCP.Technical Strategy : Lead the cloud strategy, implementing best practices in Site Reliability Engineering (SRE) and DevSecOps to ensure scalability, simplicity, and security.Collaboration:
Work closely with global engineering teams to optimize build and deployment pipelines, improve infrastructure provisioning, and drive cost optimization.Process Improvement:
Promote a culture of learning and growth, providing opportunities for professional development and skill advancement for team members.KPIs and Reporting : Establish and monitor key performance indicators (KPIs) to track the health, cost, security, and effectiveness of the cloud infrastructure.Audit and Documentation:
Conduct comprehensive audits of cloud resources (e.g., EC2 instances, Azure resources) and infrastructure tools (Puppet, Jenkins, Terraform). Maintain up-to-date, transparent documentation for all processes and systems, ensuring continuous updates and knowledge sharing across the team.Tools and Automation:
Centralize management of cloud tools and develop a unified strategy for optimizing Infrastructure as Code (IaC) and CI/CD pipelines. Streamline automation processes using Jenkins, ArgoCD, Puppet, Terraform, and Ansible to reduce manual intervention and increase efficiency.Cost & Performance Optimization:
Set up real-time monitoring tools to track cloud costs, performance, and usage patterns. Implement cost-saving strategies such as instance rightsizing, resource consolidation, and decommissioning underutilized resources to maximize efficiency.Governance : Establish robust governance policies, including role-based access controls (RBAC), tagging, usage tracking, and cost allocation to ensure accountability and security.Security & Compliance:
Establish Security and Compliance best practices. Review and enhance security policies across cloud environments and tools. Implement access controls and encryption strategies to protect cloud operations.Team Building and Development:
Recruit, mentor, and train internal staff to reduce dependency on external vendors. Foster a collaborative environment that promotes knowledge sharing and continuous improvement.Operational Reviews:
Provide regular reports on key cloud operational metrics, including cost performance, system availability, resource utilization, and cloud infrastructure efficiency.Minimum Qualifications:Hands-on Technical Experience:8+ years in cloud operations, with 5+ in leadership roles within SaaS.Proven success optimizing AWS and Azure infrastructures.Expertise in Linux as well as Windows operating systems.Expertise with IaC tools (Terraform, Ansible, Puppet) and CI/CD pipelines (Jenkins, ArgoCD).Strong knowledge of cloud cost optimization, instance management, and monitoring tools (CloudWatch, Azure Monitor).Relevant Technical Skills:Deep experience managing multi-cloud environments (AWS, Azure) and cloud-native tools.Proficient in automation, orchestration, and implementing governance, RBAC, and security best practices.Strong understanding of containerization, VM management, and scaling in cloud environments.Good to have:
Experience managing cloud infrastructure for AI workloads, optimizing for machine learning models, and automating AI deployments.Leadership:Proven track record of leading and growing high-performing teams.Skilled in managing cloud operations in high-pressure, fast-paced settings.Ability to foster collaboration, innovation, and continuous learning.Education & Certifications:Bachelor's degree from an accredited college or university with major coursework in computer science, Information Technology, or equivalent. Equivalent work experience in a similar position may be substituted for educational requirements.Relevant certifications (e.g., AWS Certified Solutions Architect, Azure Solutions Architect, Certified Kubernetes Administrator, Terraform Certified, etc.) are highly desirable.
Lead and mentor a team of Site Reliability Engineers (SREs) responsible for Infrastructure as Code (IaC), CI/CD pipelines, and cloud provisioning across AWS, Azure, and GCP.Technical Strategy : Lead the cloud strategy, implementing best practices in Site Reliability Engineering (SRE) and DevSecOps to ensure scalability, simplicity, and security.Collaboration:
Work closely with global engineering teams to optimize build and deployment pipelines, improve infrastructure provisioning, and drive cost optimization.Process Improvement:
Promote a culture of learning and growth, providing opportunities for professional development and skill advancement for team members.KPIs and Reporting : Establish and monitor key performance indicators (KPIs) to track the health, cost, security, and effectiveness of the cloud infrastructure.Audit and Documentation:
Conduct comprehensive audits of cloud resources (e.g., EC2 instances, Azure resources) and infrastructure tools (Puppet, Jenkins, Terraform). Maintain up-to-date, transparent documentation for all processes and systems, ensuring continuous updates and knowledge sharing across the team.Tools and Automation:
Centralize management of cloud tools and develop a unified strategy for optimizing Infrastructure as Code (IaC) and CI/CD pipelines. Streamline automation processes using Jenkins, ArgoCD, Puppet, Terraform, and Ansible to reduce manual intervention and increase efficiency.Cost & Performance Optimization:
Set up real-time monitoring tools to track cloud costs, performance, and usage patterns. Implement cost-saving strategies such as instance rightsizing, resource consolidation, and decommissioning underutilized resources to maximize efficiency.Governance : Establish robust governance policies, including role-based access controls (RBAC), tagging, usage tracking, and cost allocation to ensure accountability and security.Security & Compliance:
Establish Security and Compliance best practices. Review and enhance security policies across cloud environments and tools. Implement access controls and encryption strategies to protect cloud operations.Team Building and Development:
Recruit, mentor, and train internal staff to reduce dependency on external vendors. Foster a collaborative environment that promotes knowledge sharing and continuous improvement.Operational Reviews:
Provide regular reports on key cloud operational metrics, including cost performance, system availability, resource utilization, and cloud infrastructure efficiency.Minimum Qualifications:Hands-on Technical Experience:8+ years in cloud operations, with 5+ in leadership roles within SaaS.Proven success optimizing AWS and Azure infrastructures.Expertise in Linux as well as Windows operating systems.Expertise with IaC tools (Terraform, Ansible, Puppet) and CI/CD pipelines (Jenkins, ArgoCD).Strong knowledge of cloud cost optimization, instance management, and monitoring tools (CloudWatch, Azure Monitor).Relevant Technical Skills:Deep experience managing multi-cloud environments (AWS, Azure) and cloud-native tools.Proficient in automation, orchestration, and implementing governance, RBAC, and security best practices.Strong understanding of containerization, VM management, and scaling in cloud environments.Good to have:
Experience managing cloud infrastructure for AI workloads, optimizing for machine learning models, and automating AI deployments.Leadership:Proven track record of leading and growing high-performing teams.Skilled in managing cloud operations in high-pressure, fast-paced settings.Ability to foster collaboration, innovation, and continuous learning.Education & Certifications:Bachelor's degree from an accredited college or university with major coursework in computer science, Information Technology, or equivalent. Equivalent work experience in a similar position may be substituted for educational requirements.Relevant certifications (e.g., AWS Certified Solutions Architect, Azure Solutions Architect, Certified Kubernetes Administrator, Terraform Certified, etc.) are highly desirable.