Intuit Inc.

Group Manager, SRE Software Engineering (Site Reliability Engineering)

Intuit Inc., Atlanta, Georgia, United States, 30383

Mailchimp is a leading marketing platform for small businesses. We empower millions of customers around the world to build their brands and grow their companies with a suite of marketing automation, multichannel campaigns, CRM, and analytics tools.We're looking for an Engineering Leader who will lead our Site Reliability Engineering Team. In this role, you will be responsible for ensuring the reliability, scalability, and performance of our application used by both internal engineers and external customers. You will collaborate with cross-functional teams to design, implement, and maintain systems that are robust and resilient. You will be responsible for driving a cultural change of operational excellence across the organization.We are looking for experienced leaders who have a background that includes deep technical experience grounded in previous years of hands-on development in high scale, highly available systems that achieved outstanding levels of operational excellence and who have taken those learnings and applied them at scale in their organization.Intuit Mailchimp is a hybrid workplace, giving employees the opportunity to collaborate in person with team members in our Atlanta, GA or New York, New York offices two or more days per week.Responsibilities

Drive a mindset of operational excellence across the Mailchimp Engineering organizationDesign and implement strategies for site reliability operations, including automation, monitoring, and maintenance processesCoach and develop engineers responsible for site reliability and performanceStay up-to-date with industry trends and emerging technologies to drive continuous improvementCoordinate with cross-functional teams, including engineering, operations, support, and product teams to ensure the reliability and consistency of our servicesCollaborate with other operational excellence teams across Intuit on shared best practices and learningsProvide technical guidance and mentorship to team members and stakeholdersMinimum Requirements

Bachelor's or Master's degree in Computer Science, Engineering, or a related field, or equivalent experience8+ years of experience in Site Reliability Management, with 3+ years in a management roleProven track record of managing teams of engineers and developing strategies for site reliability and performance operationsExcellent communication skills and ability to lead cross-functional teams and stakeholdersProactive and results-driven attitude, with a passion for building reliable, scalable, and performant systemsProficiency in programming languages such as PHP, Go, Python and JavaStrong understanding of Linux/Unix systems and network protocolsExperience with cloud platforms such as AWS and/or Google CloudExpertise in containerization and orchestration technologies like Docker and KubernetesProficient in using monitoring and observability tools (e.g., Prometheus, Grafana, Splunk)Experience with CI/CD pipelines and automation tools (e.g., Jenkins, GitLab CI, Github Actions, etc)Knowledge of database management systems (SQL and MySQL) and caching technologiesFamiliarity with infrastructure as code (IaC) and configuration management tools (e.g., Terraform, Puppet)

#J-18808-Ljbffr