LEDGENT Technology & Engineering - Roth Staffing Companies, L.P.
Site Reliability Engineer III (Onsite)
LEDGENT Technology & Engineering - Roth Staffing Companies, L.P., Newport Beach, California, us, 92659
We are seeking a Lead Site Reliability Engineer for a 6+ month project in Newport Beach, CA. This position is onsite.LEAD SRE
As a Lead SRE, you will provide technical leadership, direction, and accountability for platform engineering, system design, and end-to-end implementation to meet and exceed the product or platform non-functional requirements including quality, security, reliability, availability, and performance. The main responsibilities include, but are not limited to, optimizing design and engineering for new systems and enhancements, including processes and day-to-day activities, to reliably support product rollout and operation in production. This role will include oversight for production operations of our portfolio of systems, as well as development/engineering of solutions to optimize system reliability and automation.How you'll help move us forward:
Lead the design, build, and implement orchestration and tooling solutions to ensure that repetitive administration tasks are performed at a high level of efficiency and free of defect.Establish best practices for structuring, automating, building, deploying, and monitoring complex distributed software products and environments.Ensure the reliability and traceability of software releases and deployments of software and infrastructure changes.Create and maintain platform architecture and design specifications to aid development, testing, and maintenance of software environments.Design and implement monitoring and recovery tools to provide for site high availability (HA) and disaster recovery (DR).Design and develop highly available infrastructure and platform components to meet the needs of our growing and evolving product lines.Design and implement security engineering best practices in all our deployed platforms and environments.Triage alerts & diagnose/resolve critical issues, managing the implementation of changes.Manage the coordination, documentation, and tracking of critical incidents and corresponding root cause analysis, ensuring rapid and complete issue resolution and appropriate closed loop to customers and other key stakeholders.Collaborate with Delivery Engineers and DevExp Engineers to enhance and implement continuous integration/continuous deployment orchestration systems to reduce friction for software delivery to production.Lead, grow, and mentor other SRE team members.Evangelize the DevSecOps culture and SRE mindset, and mentor others about reliability and best practices.Identify and work with other engineering disciplines to implement opportunities for:AutomationSignal to noise reductionPrevention of recurring issues, and other actions to reduce time to mitigate service-impacting events and increase the productivity of cloud operations and development resources.Maintain a strong understanding of IaaS, PaaS, and SaaS offerings while building and maintaining a state-of-the-art, cloud-based environment for large-scale data processing.Design and implement processes, technology, and automation for performance testing.Ensure that implementation and solutions are fully documented, and solutions deployed with fully operationalized processes to support the solution lifecycle.The experience you bring:
10-15 years of experience in infrastructure, system engineering, and software engineering.Advanced knowledge in software engineering in test, testing automation frameworks, and tools for application and/or any-as-code (infrastructure, configuration, development tools such as documentation or diagram as code).Advanced knowledge in at least 3 of the following key areas: Cloud native and IaaS Architecture (performance testing, monitoring, operations), Design (compliance, security).
#J-18808-Ljbffr
As a Lead SRE, you will provide technical leadership, direction, and accountability for platform engineering, system design, and end-to-end implementation to meet and exceed the product or platform non-functional requirements including quality, security, reliability, availability, and performance. The main responsibilities include, but are not limited to, optimizing design and engineering for new systems and enhancements, including processes and day-to-day activities, to reliably support product rollout and operation in production. This role will include oversight for production operations of our portfolio of systems, as well as development/engineering of solutions to optimize system reliability and automation.How you'll help move us forward:
Lead the design, build, and implement orchestration and tooling solutions to ensure that repetitive administration tasks are performed at a high level of efficiency and free of defect.Establish best practices for structuring, automating, building, deploying, and monitoring complex distributed software products and environments.Ensure the reliability and traceability of software releases and deployments of software and infrastructure changes.Create and maintain platform architecture and design specifications to aid development, testing, and maintenance of software environments.Design and implement monitoring and recovery tools to provide for site high availability (HA) and disaster recovery (DR).Design and develop highly available infrastructure and platform components to meet the needs of our growing and evolving product lines.Design and implement security engineering best practices in all our deployed platforms and environments.Triage alerts & diagnose/resolve critical issues, managing the implementation of changes.Manage the coordination, documentation, and tracking of critical incidents and corresponding root cause analysis, ensuring rapid and complete issue resolution and appropriate closed loop to customers and other key stakeholders.Collaborate with Delivery Engineers and DevExp Engineers to enhance and implement continuous integration/continuous deployment orchestration systems to reduce friction for software delivery to production.Lead, grow, and mentor other SRE team members.Evangelize the DevSecOps culture and SRE mindset, and mentor others about reliability and best practices.Identify and work with other engineering disciplines to implement opportunities for:AutomationSignal to noise reductionPrevention of recurring issues, and other actions to reduce time to mitigate service-impacting events and increase the productivity of cloud operations and development resources.Maintain a strong understanding of IaaS, PaaS, and SaaS offerings while building and maintaining a state-of-the-art, cloud-based environment for large-scale data processing.Design and implement processes, technology, and automation for performance testing.Ensure that implementation and solutions are fully documented, and solutions deployed with fully operationalized processes to support the solution lifecycle.The experience you bring:
10-15 years of experience in infrastructure, system engineering, and software engineering.Advanced knowledge in software engineering in test, testing automation frameworks, and tools for application and/or any-as-code (infrastructure, configuration, development tools such as documentation or diagram as code).Advanced knowledge in at least 3 of the following key areas: Cloud native and IaaS Architecture (performance testing, monitoring, operations), Design (compliance, security).
#J-18808-Ljbffr