Bright Horizons Children's Centers

Site Performance Engineer

Bright Horizons Children's Centers, Newton, Massachusetts, United States, 02165

What you will be doing:The Site Reliability Engineer role involves a comprehensive understanding of application architecture, infrastructure, and non-functional requirements to identify and address production workloads effectively. Responsibilities include monitoring systems in both production and non-production environments, troubleshooting and resolving technical issues, and actively preventing potential problems. Experience in performance engineering and testing, including load, scalability, and performance testing of cloud-based applications, is essential. The role also involves utilizing APM tools, running automated performance tests via CI/CD pipelines. Additionally, Site Reliability Engineers collaborate closely with software developers to optimize system performance, conduct performance analysis, and implement long-term performance improvement strategies.What we hope you will bring to this role:Troubleshoots, isolates and resolves applications code issues and other technical problems (hardware, software, Infra and network).Actively monitors the systems in PROD/non-prod environments and alerts the core group to prevent issues from happening.Design, script, configure and run performance tests to validate system performance and stability.Perform root cause analysis of performance issues and work with developers on corrective actions.Work closely with software developers to define and implement software frameworks, libraries, and tools that maximize performance and application productivity; identify performance bottlenecks and development enhancements.Conduct performance analysis, benchmarking, and modeling to identify performance bottlenecks, optimize system parameters, and guide architectural enhancements.Maintain and execute Test Scripts & Reporting.Job Requirements - Education/ExperienceBachelor's Degree in Computer Science or Engineering - Required5 years of Application monitoring/Observability, building dashboards, identifying & establishing SLA's, logging & tracing - Required3 years of Troubleshoots, isolates and resolves applications code issues and other technical problems (hardware, software, Infra and network) - Required3 years of Actively monitors the systems in PROD/non-prod environments and alerts the core group to prevent issues from happening - Required3 years of Experience with APM tools such as DataDog/Dynatrace/AppDynamics/New Relic/Full-Story - Required3 years of Experience in running automated performance tests via CI/CD pipelines (on platforms like GitHub etc.) - Required3 years of LoadRunner and/or JMeter - RequiredAdditional Job RequirementsLeadership skills - including but not limited to mentoring, coaching, and training abilities.Very good problem-solving skills, communication skills, and ability to work in a dynamic work environment with minimal supervision.Strong experience in understanding of application architecture, infrastructure, Non-functional Requirements & identifying production workloads.Experience with application monitoring/Observability, building dashboards, identifying & establishing SLA's, logging & tracing.Strong experience in performance engineering & testing on cloud & on-prem for tiered environments.Experience in Load, Scalability, and performance testing of cloud-based applications.Experience with APM tools such as DataDog/Dynatrace/AppDynamics/New Relic/Full-Story.Experience in running automated performance tests via CI/CD pipelines (on platforms like GitHub etc.).Experience with container technologies such as Docker or Kubernetes.Strong analytical and problem-solving skills.Experience in conducting Resilience tests & Chaos engineering.Knowledge in API testing - ReadyAPI, POSTMAN.Experience with industry-standard tools like LoadRunner and/or JMeter is a must.Good work experience in Agile/Scrum development projects in a distributed environment.Experience in observability/Site Reliability engineering; Track and manage performance issues to resolution.Experience working with other performance testing engineers on implementing a Continuous Performance program to support long-term application reliability and growth.

#J-18808-Ljbffr