Illumio
Senior Engineer, Cloud (Observability Lead)
Illumio, Greendale, Wisconsin, United States, 53129
Senior Engineer, Cloud (Observability Lead)
Illumio
protects your network and secures your cloud with the leading Zero Trust Segmentation platform. Stay safe with our advanced cloud security solutions.Location:
Onsite, Sunnyvale, California (5 days a week in the office)
Onwards Together!
Illumio, the pioneer and market leader of Zero Trust segmentation, prevents breaches from becoming cyber disasters. Illumio protects critical applications and valuable digital assets with proven segmentation technology purpose-built for the Zero Trust security model. Illumio ransomware mitigation and segmentation solutions see risk, isolate attacks, and secure data across cloud-native apps, hybrid and multi-clouds, data centers, and endpoints, enabling the world’s leading organizations to strengthen their cyber resiliency and reduce risk.Illuminate the future with Illumio and join a team that’s passionate about developing cutting-edge security solutions that protect the world's most critical assets.Your Impact:
We are seeking a Senior Engineer for our Cloud team with a strong focus on observability to join our engineering team as the Observability Lead. In this role, you will champion initiatives to enhance the reliability, visibility, and operational readiness of our production systems. You will collaborate closely with engineers to catalog services, improve logging practices, reduce log noise, and integrate additional metrics across all applications. Additionally, you will develop runbooks, build dashboards, and manage PagerDuty configurations and escalation workflows.Serve as an advocate for observability practices within the engineering team, promoting operational best practices and reliability.Catalog all production services, documenting critical details for operational visibility and management.Collaborate with engineering teams to develop and implement a comprehensive observability plan, ensuring metrics are integrated into all services.Enhance logging practices where needed, reduce log noise, and ensure meaningful insights are captured.Add and refine metrics across applications to improve operational visibility and performance tracking.Develop detailed runbooks for critical alerts and incidents, facilitating efficient response processes.Build and maintain dashboards that offer insights into SLAs, performance, and business metrics for engineering and product teams.Set up and manage PagerDuty alerts, define on-call duties, and establish incident escalation paths.Continuously improve alerting, logging, and monitoring processes to enhance service reliability and reduce unnecessary noise.Your Toolkit:
Proven experience in a DevOps or observability-focused role, concentrating on production service management and operational excellence.Prior experience working with microservices in a production environment is a must.At least 5+ years of experience managing large numbers of instances in public clouds like AWS, Azure, GCP, etc.Strong expertise in observability practices and tools (e.g., Prometheus, Grafana, Datadog).Experience enhancing logging, reducing log noise, and integrating critical metrics into services.Proficiency in building and managing dashboards and monitoring tools.Expertise in setting up and managing PagerDuty alerts, with on-call rotation and escalation management knowledge.Strong collaboration skills to work closely with engineering teams, advocating for observability best practices.Familiarity with cloud platforms (AWS, GCP, Azure) and modern CI/CD processes.Automation scripting or coding experience (Python, Go, or similar).Knowledge of infrastructure-as-code tools (e.g., Terraform, CloudFormation).Excellent problem-solving skills and attention to detail in managing complex systems.Compensation:
$161,000 USD - $185,000 USDThe pay range for this job level is a general guideline only and not a guarantee of compensation or salary. Additional factors considered in extending an offer include responsibilities of the job, education, location, experience, knowledge, skills, abilities, and internal equity, alignment with market data, or applicable laws.At Illumio we offer a wide range of benefits to our eligible team members. Our benefit programs vary by location and can include Medical, Dental, Vision Coverage – Health and Dependent Savings Accounts – Life and Disability Programs – Paid Parental Leave – Voluntary Benefit Programs – Company Sponsored Wellness Program – Wellness Reimbursement Program - Retirement Savings – Equity Opportunities – Paid time off and Paid Holidays – Employee Incentive Program.Our Commitment:
Illumio believes that an environment of unique backgrounds, experiences, viewpoints, and individual contributions drives our success and makes us stronger together. We are dedicated to creating and maintaining a diverse culture and emphasizing inclusion and belonging.
#J-18808-Ljbffr
Illumio
protects your network and secures your cloud with the leading Zero Trust Segmentation platform. Stay safe with our advanced cloud security solutions.Location:
Onsite, Sunnyvale, California (5 days a week in the office)
Onwards Together!
Illumio, the pioneer and market leader of Zero Trust segmentation, prevents breaches from becoming cyber disasters. Illumio protects critical applications and valuable digital assets with proven segmentation technology purpose-built for the Zero Trust security model. Illumio ransomware mitigation and segmentation solutions see risk, isolate attacks, and secure data across cloud-native apps, hybrid and multi-clouds, data centers, and endpoints, enabling the world’s leading organizations to strengthen their cyber resiliency and reduce risk.Illuminate the future with Illumio and join a team that’s passionate about developing cutting-edge security solutions that protect the world's most critical assets.Your Impact:
We are seeking a Senior Engineer for our Cloud team with a strong focus on observability to join our engineering team as the Observability Lead. In this role, you will champion initiatives to enhance the reliability, visibility, and operational readiness of our production systems. You will collaborate closely with engineers to catalog services, improve logging practices, reduce log noise, and integrate additional metrics across all applications. Additionally, you will develop runbooks, build dashboards, and manage PagerDuty configurations and escalation workflows.Serve as an advocate for observability practices within the engineering team, promoting operational best practices and reliability.Catalog all production services, documenting critical details for operational visibility and management.Collaborate with engineering teams to develop and implement a comprehensive observability plan, ensuring metrics are integrated into all services.Enhance logging practices where needed, reduce log noise, and ensure meaningful insights are captured.Add and refine metrics across applications to improve operational visibility and performance tracking.Develop detailed runbooks for critical alerts and incidents, facilitating efficient response processes.Build and maintain dashboards that offer insights into SLAs, performance, and business metrics for engineering and product teams.Set up and manage PagerDuty alerts, define on-call duties, and establish incident escalation paths.Continuously improve alerting, logging, and monitoring processes to enhance service reliability and reduce unnecessary noise.Your Toolkit:
Proven experience in a DevOps or observability-focused role, concentrating on production service management and operational excellence.Prior experience working with microservices in a production environment is a must.At least 5+ years of experience managing large numbers of instances in public clouds like AWS, Azure, GCP, etc.Strong expertise in observability practices and tools (e.g., Prometheus, Grafana, Datadog).Experience enhancing logging, reducing log noise, and integrating critical metrics into services.Proficiency in building and managing dashboards and monitoring tools.Expertise in setting up and managing PagerDuty alerts, with on-call rotation and escalation management knowledge.Strong collaboration skills to work closely with engineering teams, advocating for observability best practices.Familiarity with cloud platforms (AWS, GCP, Azure) and modern CI/CD processes.Automation scripting or coding experience (Python, Go, or similar).Knowledge of infrastructure-as-code tools (e.g., Terraform, CloudFormation).Excellent problem-solving skills and attention to detail in managing complex systems.Compensation:
$161,000 USD - $185,000 USDThe pay range for this job level is a general guideline only and not a guarantee of compensation or salary. Additional factors considered in extending an offer include responsibilities of the job, education, location, experience, knowledge, skills, abilities, and internal equity, alignment with market data, or applicable laws.At Illumio we offer a wide range of benefits to our eligible team members. Our benefit programs vary by location and can include Medical, Dental, Vision Coverage – Health and Dependent Savings Accounts – Life and Disability Programs – Paid Parental Leave – Voluntary Benefit Programs – Company Sponsored Wellness Program – Wellness Reimbursement Program - Retirement Savings – Equity Opportunities – Paid time off and Paid Holidays – Employee Incentive Program.Our Commitment:
Illumio believes that an environment of unique backgrounds, experiences, viewpoints, and individual contributions drives our success and makes us stronger together. We are dedicated to creating and maintaining a diverse culture and emphasizing inclusion and belonging.
#J-18808-Ljbffr