University of California - San Francisco
Observability Engineer
University of California - San Francisco, San Francisco, California, United States, 94199
Observability Engineer
F_IT COMMAND CENTER Full Time 80620BR Job Summary An Observability Engineer within the Incident Command team plays a critical role in monitoring, evaluating, and optimizing the performance and health of IT systems and applications. This position is pivotal in ensuring that the IT infrastructure operates efficiently and is capable of handling emerging issues swiftly and effectively. The primary duties of an Observability Engineer include the development and maintenance of monitoring tools and dashboards that provide real-time insights into the operational status of IT systems. This role involves the collection and evaluation of metrics, logs, and traces to proactively detect, diagnose, and resolve performance bottlenecks or anomalies before they escalate into more significant incidents. Furthermore, the Observability Engineer partners closely with other IT and incident management teams to enhance incident response strategies. They are tasked with improving the observability framework by integrating advanced analytics and machine learning techniques to predict potential system failures and automate response processes. The Observability Engineer will positively impact UCSF's operations and culture by ensuring UCSF's IT infrastructure is operable, secure, efficient, and effective in service of the University's mission. This role will execute UCSF's vision while modeling UCSF's culture and values. The final salary and offer components are subject to additional approvals based on UC policy. Your placement within the salary range is dependent on a number of factors including your work experience and internal equity within this position classification at UCSF. The salary range for this position is $120,300 - $194,600 (Annual Rate). Department Description University of California, San Francisco (UCSF) is distinguished as a leading academic healthcare organization, home to groundbreaking discoveries, world-class education, and exceptional healthcare services. Infrastructure Services (IS) is the backbone of the technological infrastructure, assuring the technical services that enable the academic, medical, and research missions of the organization. IS values innovation and excellence in ensuring secure and efficient Information Technology (IT) services. The Incident Command team within Infrastructure Services operates as a critical support system for the community of medical and health researchers. This team is dedicated to ensuring seamless access to essential IT resources, thereby enabling continuous and vital research work that has a profound impact on human health and well-being. Required Qualifications Bachelor's degree, or equivalent combination of experience/training, in one or more of the following fields: computer science, engineering, computer information systems, etc. 5 to 7 years of experience in information technology or IT Service Management. Expertise in using advanced monitoring and observability tools such as Datadog, Spectrum, Prometheus, Grafana, Splunk, or New Relic. Advanced ability to analyze and interpret complex data from various sources to diagnose issues and understand system behaviors. Skilled in responding to and managing incidents efficiently. Proficiency in automating monitoring tasks using scripting languages such as Python, Bash, PowerShell, JAVA, YAML, and XML. Demonstrated experience using PagerDuty, OpsGenie, or comparable applications. Excellent communication skills for effectively articulating incident details. Advanced problem-solving skills. Deep understanding of IT infrastructure including networks, servers, databases, logging, and cloud services. Ability to document incidents and create detailed reports. Ability to lead and collaborate with team members in high-stress situations. Proficiency in risk management. Understanding of compliance requirements relevant to IT operations. Preferred Qualifications Information Technology Infrastructure Library (ITIL) About UCSF The University of California, San Francisco (UCSF) is a leading university dedicated to promoting health worldwide through advanced biomedical research and excellence in patient care. Pride Values UCSF is a diverse community made of people with many skills and talents. We seek candidates whose work experience or community service has prepared them to contribute to our commitment to professionalism, respect, integrity, diversity and excellence. Equal Employment Opportunity The University of California San Francisco is an Equal Opportunity/Affirmative Action Employer. Location : San Francisco, CA Work Style : Fully On-Site Shift : Days, 8am-5pm with on-call rotation
#J-18808-Ljbffr
F_IT COMMAND CENTER Full Time 80620BR Job Summary An Observability Engineer within the Incident Command team plays a critical role in monitoring, evaluating, and optimizing the performance and health of IT systems and applications. This position is pivotal in ensuring that the IT infrastructure operates efficiently and is capable of handling emerging issues swiftly and effectively. The primary duties of an Observability Engineer include the development and maintenance of monitoring tools and dashboards that provide real-time insights into the operational status of IT systems. This role involves the collection and evaluation of metrics, logs, and traces to proactively detect, diagnose, and resolve performance bottlenecks or anomalies before they escalate into more significant incidents. Furthermore, the Observability Engineer partners closely with other IT and incident management teams to enhance incident response strategies. They are tasked with improving the observability framework by integrating advanced analytics and machine learning techniques to predict potential system failures and automate response processes. The Observability Engineer will positively impact UCSF's operations and culture by ensuring UCSF's IT infrastructure is operable, secure, efficient, and effective in service of the University's mission. This role will execute UCSF's vision while modeling UCSF's culture and values. The final salary and offer components are subject to additional approvals based on UC policy. Your placement within the salary range is dependent on a number of factors including your work experience and internal equity within this position classification at UCSF. The salary range for this position is $120,300 - $194,600 (Annual Rate). Department Description University of California, San Francisco (UCSF) is distinguished as a leading academic healthcare organization, home to groundbreaking discoveries, world-class education, and exceptional healthcare services. Infrastructure Services (IS) is the backbone of the technological infrastructure, assuring the technical services that enable the academic, medical, and research missions of the organization. IS values innovation and excellence in ensuring secure and efficient Information Technology (IT) services. The Incident Command team within Infrastructure Services operates as a critical support system for the community of medical and health researchers. This team is dedicated to ensuring seamless access to essential IT resources, thereby enabling continuous and vital research work that has a profound impact on human health and well-being. Required Qualifications Bachelor's degree, or equivalent combination of experience/training, in one or more of the following fields: computer science, engineering, computer information systems, etc. 5 to 7 years of experience in information technology or IT Service Management. Expertise in using advanced monitoring and observability tools such as Datadog, Spectrum, Prometheus, Grafana, Splunk, or New Relic. Advanced ability to analyze and interpret complex data from various sources to diagnose issues and understand system behaviors. Skilled in responding to and managing incidents efficiently. Proficiency in automating monitoring tasks using scripting languages such as Python, Bash, PowerShell, JAVA, YAML, and XML. Demonstrated experience using PagerDuty, OpsGenie, or comparable applications. Excellent communication skills for effectively articulating incident details. Advanced problem-solving skills. Deep understanding of IT infrastructure including networks, servers, databases, logging, and cloud services. Ability to document incidents and create detailed reports. Ability to lead and collaborate with team members in high-stress situations. Proficiency in risk management. Understanding of compliance requirements relevant to IT operations. Preferred Qualifications Information Technology Infrastructure Library (ITIL) About UCSF The University of California, San Francisco (UCSF) is a leading university dedicated to promoting health worldwide through advanced biomedical research and excellence in patient care. Pride Values UCSF is a diverse community made of people with many skills and talents. We seek candidates whose work experience or community service has prepared them to contribute to our commitment to professionalism, respect, integrity, diversity and excellence. Equal Employment Opportunity The University of California San Francisco is an Equal Opportunity/Affirmative Action Employer. Location : San Francisco, CA Work Style : Fully On-Site Shift : Days, 8am-5pm with on-call rotation
#J-18808-Ljbffr