ThousandEyes

Senior Manager, Site Reliability Engineering

ThousandEyes, San Francisco, California, United States, 94199

Who We Are

Cisco ThousandEyes is a Digital Experience Assurance platform that empowers organizations to deliver flawless digital experiences across every network – even the ones they don’t own. Powered by AI and an unmatched set of cloud, Internet and enterprise network telemetry data, ThousandEyes enables IT teams to proactively detect, diagnose, and remediate issues – before they impact end-user experiences. ThousandEyes is deeply integrated across the entire Cisco technology portfolio and beyond, helping customers deploy at scale while also delivering AI-powered assurance insights within Cisco’s leading Networking, Security, Collaboration, and Observability portfolios. About the role

This role is the

Senior Manager

for the

Infrastructure SRE

team at ThousandEyes. This globally-distributed team architects and owns all things infra (up to the Kubernetes layer). Expected areas of expertise include traffic management and ingress, DNS, WAF, regional VPC management, capacity planning, disaster recovery planning, and foundational Infrastructure as Code using Terraform and its friends. In addition, the team is composed of experts on availability, latency, performance, efficiency, change management, observability, and emergency response, with a strong focus on automation and delivering solutions focused on developer efficiency. What You'll Do

Lead and inspire a talented, globally-distributed SRE team, fostering a culture of innovation, collaboration, and excellence. Drive the strategic vision for the build, implementation, and management of our cloud-based infrastructure and systems, with a “reliability and security” mindset. Collaborate closely with cross-functional teams, including SRE peer groups, software engineering, product management, and security to define and implement continuous improvement on our platform and stack. Provide domain-expertise oversight, guidance, and support to the growing sets of product teams within ThousandEyes. Stay current with industry best practices, tooling, and security & automation, and apply this knowledge to enhance the security posture of our platform and systems. Drive operational excellence in operations and security processes. Mentor and develop engineering talent, fostering a culture of continuous learning and professional growth within the Infrastructure SRE team. Qualifications

You have led a team of 6+ engineers, ideally across time zones. You have a total of 5+ years of experience building and supporting missing critical services with focus on automation, availability, and performance. Experience building and/or operating in public cloud environments. Strong focus on developer efficiency and reduced friction in developing scalable Infrastructure as Code. Familiarity and lived experience with robust incident response processes. Can provide strong technical vision for your team and ensure consistent delivery on objectives. Have experience formulating a team's technical strategy and roadmap; you've collaborated and partnered effectively with several other teams to execute on shared goals. You have worked on large-scale distributed systems including multi-tiered architecture. Understand how to balance tactical needs with strategic growth and quality-based initiatives that can span multiple quarters.

#J-18808-Ljbffr