Palo Alto Networks
Sr Staff Site Reliability Engineer (Cortex)
Palo Alto Networks, Santa Clara, California, United States,
Job Description
Your Career
The Cortex team builds and delivers the industry’s most advanced SecOps platform, consisting of XSIAM, XSOAR, and XPANSE. As a member of the Cortex DevOps team,
your role involves operating and maintaining a large-scale GCP environment, including the design, implementation, and continuous enhancement of our comprehensive observability systems. To meet the opportunities that such a role provides, you will have a deep knowledge of modern observability and monitoring tools and practices, having managed high cardinality metrics, implemented tracing, and operationalized large scale logging solutions. As part of this role, you will
collaborate closely with our engineering teams to develop innovative solutions that provide clear and actionable insights into our systems’ performance and health.
Your Impact
As a Senior Staff SRE with the Cortex Observability team, you will:
Cloud Expertise - Utilize your expertise in monitoring cloud platforms, particularly GCP, to optimize our infrastructure leveraging cloud-native technologies
Monitoring Expertise - Improve monitoring processes, alerts, and metrics - Work with development teams to ensure that all of our services have the right monitoring and metrics in place so that we detect problems before our customers do
Incident Management - Leverage incident management processes to ensure efficient resolution of system issues and minimal impact on services
Automation - Automate complex monitoring and alerting tasks by building tools for cloud operations, such as automated remediation of known issues and auto-scaling
Continuously Improve - Stay up-to-date with cutting-edge technologies, evaluate their potential impact on our operations, and implement them when appropriate
On-Call - Participate with our DevOps team to provide follow-the-sun operational coverage in the production of our SaaS product
Collaborate - Work with our Engineering team to influence the operability of the product and ensure the reliability and availability of our services
Your Career
The Cortex team builds and delivers the industry’s most advanced SecOps platform, consisting of XSIAM, XSOAR, and XPANSE. As a member of the Cortex DevOps team,
your role involves operating and maintaining a large-scale GCP environment, including the design, implementation, and continuous enhancement of our comprehensive observability systems. To meet the opportunities that such a role provides, you will have a deep knowledge of modern observability and monitoring tools and practices, having managed high cardinality metrics, implemented tracing, and operationalized large scale logging solutions. As part of this role, you will
collaborate closely with our engineering teams to develop innovative solutions that provide clear and actionable insights into our systems’ performance and health.
Your Impact
As a Senior Staff SRE with the Cortex Observability team, you will:
Cloud Expertise - Utilize your expertise in monitoring cloud platforms, particularly GCP, to optimize our infrastructure leveraging cloud-native technologies
Monitoring Expertise - Improve monitoring processes, alerts, and metrics - Work with development teams to ensure that all of our services have the right monitoring and metrics in place so that we detect problems before our customers do
Incident Management - Leverage incident management processes to ensure efficient resolution of system issues and minimal impact on services
Automation - Automate complex monitoring and alerting tasks by building tools for cloud operations, such as automated remediation of known issues and auto-scaling
Continuously Improve - Stay up-to-date with cutting-edge technologies, evaluate their potential impact on our operations, and implement them when appropriate
On-Call - Participate with our DevOps team to provide follow-the-sun operational coverage in the production of our SaaS product
Collaborate - Work with our Engineering team to influence the operability of the product and ensure the reliability and availability of our services