Logo
Apple Inc.

Cloud Monitoring SRE Manager

Apple Inc., Seattle, Washington, us, 98127


People at Apple don't just build products — they craft the kind of experience that have revolutionized entire industries. The diverse collection of our people and their ideas inspire innovation in everything we do. Imagine what you could do here! Join Apple, and help us leave the world better than we found it.The Apple Service Engineering (ASE) team builds and provides systems and infrastructure that fuel Apple's services (such as iCloud, iTunes, Siri, and Maps). We are the foundation on which Apple's software developers build the products that our customers love. We are looking for a passionate and dedicated Site Reliability Engineering Manager to lead a team which focuses on providing our customers the highest quality Apple Services experience. Our services have to scale globally, stay highly available, and "just work." If you love designing, engineering and running systems and infrastructure that will help millions of customers, then this is the place for you!The Cloud Monitoring SRE organization is specifically tasked with enabling other teams to better understand their infrastructure and services, providing extraordinary observability capabilities. Keeping Apple services up and running 100% of the time is a critical job. Accurately monitoring the health of every application and infrastructure that comprises the Apple ecosystem 100% of the time is an order of magnitude more challenging.As a Site Reliability Engineering Manager for the Cloud Monitoring Team at Apple, you will be working to build and mentor a team to improve the reliability and performance of the software systems that provide access to the services & infrastructure that runs Apple. Our monitoring, alerting, and visualization platform analyzes billions of metrics per minute and comprises the central nervous system of Apple's architecture.Description

Apple Services Engineering infrastructure is BIG. Operating at our scale, across multiple geographically dispersed data centers and servicing hundreds of millions of users presents unique challenges. As a Site Reliability Engineering Manager, you will be leading a team responsible for providing the platform for mission-critical observability services to maintain constant uptime, scale seamlessly, and allow for new applications and services to thrive.The successful candidate will be highly self-motivated with a passion for excellence, quality and detail. The SRE Manager will not only support operations but also collaborate with the developers and architects within the team to aid in the design and assist with the implementation to improve stability, security and scalability.Responsibilities

Lead SRE teams responsible for reliability and performance of cloud-based monitoring servicesLead staging and production environments with goal of maximizing availabilityPromote observability of systems for monitoring, alerting, and metrics reportingAdvocate best practices of reliability engineeringMinimum Qualifications

Minimum 5+ years of handling services in a large scale environment.Desire to build, grow, and mentor a team to meet both their career goals and the organization's goalsExperience with hiring and leading engineersExperience with Cloud Computing technologies (particularly Kubernetes)Experience and confidence around incident response and incident managementExperience with the Prometheus ecosystemPractical experience in Python, bash scripting. Theoretical knowledge of Go, Java, and/or Scala.Acute aim to automate manual operations and to improve them through repeated iterationStrong sense of ownership and integrity demonstrated through clear communication and collaborationPreferred Qualifications

2+ years professional experience in an engineering leadership positionComfortable with Open Source configuration management and orchestration tools (such as Helm, Puppet, and Spinnaker)Developing and delivering multi-mode communications that convey a clear understanding of the unique needs of different audiences.Knowing the most effective and efficient processes to get things done, with a focus on continuous improvementRebounding from setbacks and adversity when facing difficult situations.Anticipating and balancing the needs of multiple stakeholdersExperience in running and scaling distributed systems in a public, private, or hybrid cloud environmentMaking sense of complex, high quantity, and sometimes contradictory information to successfully solve problemsBachelors or Master's degree in computer science or similar field or equivalent experience.At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $181,000 and $272,300, and your base pay will depend on your skills, qualifications, experience, and location.Apple employees also have the opportunity to become an Apple shareholder through participation in Apple’s discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple’s Employee Stock Purchase Plan. You’ll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits.Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.Apple is an equal opportunity employer that is committed to inclusion and diversity. We take affirmative action to ensure equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant.

#J-18808-Ljbffr