Logo
Saviynt Inc.

Senior Manager – SaaS Operations Monitoring and Alerting

Saviynt Inc., El Segundo, California, United States, 90245


Saviynt’s Enterprise Identity Cloud helps modern enterprises scale cloud initiatives and solve the toughest security and compliance challenges in record time. The company brings together identity governance (IGA), granular application access, cloud security, and privileged access (PAM) to secure the entire business ecosystem and provide a frictionless user experience. The world’s largest brands trust Saviynt to accelerate digital transformation, empower distributed workforces, and meet continuous compliance.Our Monitoring and Alerting team within the SaaS Operations team combines Operations Excellence with the Development Experience to deliver services at high scale, high availability with resilience by using automation and Infrastructure Code. We build reliability into our ecosystem by applying best practices in Resiliency Engineering, Automation, Observability & Chaos Testing. The primary focus is on implementing monitoring solutions, establishing alerting processes, and ensuring timely responses to incidents.The Manager SaaS Operations Monitoring and Alerting for our Cloud Services is a Technical Leader who plays a crucial role in overseeing the monitoring, alerting, and incident response functions specifically tailored to cloud-based environments. This role requires enhancing and building a Monitoring and Alerting program by implementing the best-in-class monitoring and alerting practices and tools, while fostering a collaborative and innovative culture within the team.The team comes from diverse technical backgrounds, and the responsibilities provide the opportunity for a variety of challenges. Ideal candidates will have a background in either software engineering or systems engineering with a desire to learn the other or previous experience with building and managing Monitoring and Alerting systems. We are looking for a Systems Thinking, Technical Manager who has helped teams scale through production insights, operational automation, building observability program, developer guidance, real-time metrics, and automation.Objectives for the role:

Lead and build a world-class Monitoring and Alerting Program for our Cloud Services to guarantee high availability and performance, with a dedicated focus on SLA and availability metrics for a large-scale Cloud environment.Develop and implement a comprehensive monitoring and alerting strategy aligned with the organization's goals and objectives.Define key performance indicators (KPIs) and metrics to measure the health and performance of systems, networks, and applications.Identify, evaluate, and implement monitoring tools and technologies that suit the organization's needs.Collaborate with engineering and operations teams to identify critical components and systems requiring enhanced availability measures.Continuously evaluate and recommend improvements to platform infrastructure and processes, enhancing efficiency and reliability.Run the production environment by monitoring availability and taking a holistic view of system health.Develop and manage the budget for monitoring and alerting initiatives.Optimize costs while ensuring the effectiveness of monitoring solutions.Build software and systems to monitor platform infrastructure and applications at scale.Integrate monitoring tools with cloud service providers (e.g., AWS, Azure, Google Cloud) to capture relevant metrics and events.Build and execute an alerting strategy by establishing and refining alerting thresholds based on cloud service performance and business requirements.Implement automated responses for common issues and alerts.Analyze performance data to identify trends, anomalies, and opportunities for improvement.Utilize scripting and automation tools to enhance monitoring capabilities and streamline repetitive tasks.Provide ongoing training to the team on cloud monitoring tools, techniques, and best practices.Encourage and support team members in obtaining relevant certifications in cloud technologies.The Expertise You Have

Bachelor’s degree or higher in a technology-related field (e.g. Engineering, Computer Science, etc.) required, master’s degree a plus.10+ years professional experience in Monitoring and Alerting roles on major cloud platforms (AWS, Azure), with Program leadership roles.6+ years of experience in Cloud (AWS, Azure) and observability skills; experience with building and operating highly resilient platforms in AWS cloud environments.3+ years of experience in managing Monitoring and Alerting teams.Possess a deep understanding of cloud technologies (AWS, K8, Azure), DevOps practices, and be adept at implementing monitoring solutions that cater to the unique challenges of the cloud.Experience running programs for observability, monitoring, and alerting on large scale distributed systems.Strong Technical Leader in Monitoring and Alerting space.Expertise on Logging and monitoring tools (Prefer: Prometheus, Grafana, Datadog, AWS CloudWatch; Related: Azure Monitor, Log Analytics, Fluentd).The Skills You Bring

Proven experience in building Monitoring and Alerting programs.Implementing advanced observability practices and techniques at scale.Lead, strategize, plan, and execute programs.Experience with instrumentation with systems skills on building and operating, monitoring, logging, alerting services of distributed systems at scale.Solid understanding of Cloud Computing, Networking, and DevOps concepts.Experience with Microservices, and databases.Proven experience in maintaining scalability and resiliency of complex environments.Knowledge of Network Security (e.g. AWS Policy, Azure Policy, VPN, Active Directory/RBAC, ACLs, NSG rules, private endpoints).Ability to lead team to triage, execute root cause analysis, and be decisive under pressure.Proficient communication skills with an ability to reach both technical and non-technical audiences.The Value You Deliver

Enhance and build a world-class Monitoring and Alerting program for our Cloud offerings.Define and execute a comprehensive reliability and observability strategy to work at scale, ensuring that Saviynt systems are always available when our customers need them.Hire and retain the best industry talent.You will execute plans for technical standardization and process refinement within the engineering organization, especially for Site Reliability Engineers.Benefits

Flexible work arrangementsMedical, Dental, Vision, and Life Insurance401KUnlimited VacationSick payDaily catered lunches and healthy snacks at officesTeam Socials$150,000 - $190,000 a yearWe offer you a competitive total rewards package, learning, and tremendous opportunities to grow and advance in your career. The wage range for this role takes into account the wide range of factors that are considered in making compensation decisions including but not limited to skill sets; experience and training; licensure and certifications; and other business and organizational needs. The disclosed range estimate has not been adjusted for the applicable geographic differential associated with the location at which the position may be filled. At Saviynt, it is not typical for an individual to be hired at or near the top of the range for their role and compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current range is $150,000 to $190,000 annually.You may also be eligible to participate in a Saviynt discretionary bonus plan, subject to the rules governing the program, whereby an award, if any, depends on various factors, including, without limitation, individual and organizational performance.Saviynt is an amazing place to work. We are a high-growth, cloud software company with phenomenal people, that is building the most innovative identity platform in the world. Your time at Saviynt will be worthwhile. You will experience tremendous growth and learning while being part of something you are helping to define and build from the ground up. Through challenging yet rewarding work, you will be able to directly impact our clients, all within a welcoming and positive work environment. If you're resilient and enjoy working in a dynamic high-growth environment, you belong with us!

#J-18808-Ljbffr