ZipRecruiter
Cloud Site Reliability Engineer (SRE)
ZipRecruiter, Washington, District of Columbia, us, 20022
Job DescriptionJob DescriptionWe are seeking a skill, legally authorized to work in the US Cloud Site Reliability Engineer. Do you have an interest in Infrastructure Engineering, software architecture design and cloud computing? SRE/Cloud Engineers are responsible for creating infrastructure designs and guiding the development and implementation of cloud applications, systems, and processes. You will help build cutting-edge proof-of-concepts and collaborate with other engineering teams during implementation. The ideal candidate will need to be able to handle multiple tasks in a fast-paced team environment.
Key Job Functions
• Design automation implementation strategy for business and IT processes using the latest AWS technologies.
• Design, implement and maintain robust Infrastructure pipelines and deployment process for machine learning models and AI applications, ensuring reliability and efficient resource utilization.
• Build and maintain a robust infrastructure platform to support and enhance business processes and engineering capabilities through self-service deployment capabilities.
• Assist in automating Infrastructure as Code in Amazon Web Services with CloudFormation or Terraform.
• Build CI/CD pipeline utilizing AWS suite of services to orchestrate provisioning and deployment of both large- and small-scale systems.
• Write unit tests and integration tests to validate infrastructure code.
• Identify engineering defects in the existing code base and constantly improve the code quality.
• Establish and mature standards for observability, monitoring, and troubleshooting to detect and respond to production issues effectively and proactively.
• Automate routine operational tasks to improve the efficiency of ML Ops processes and reduce manual intervention.
• Support incident response processes, including troubleshooting and root cause analysis of system failures and performance degradation.
• Define and document best practices and strategies regarding application development and deployment activities.
Qualifications:
Education
• Bachelor’s degree in Computer Science, MIS or related technical field required.
Minimum Experience
• Minimum 3-5 years of Infrastructure and DevOps related engineering experience
Specialized Knowledge & Skills
• Proficiency in at least one high-level scripting (PowerShell, Python, Bash, etc.).
• Legally authorized to work in the US or is a US . • Working knowledge of Continuous Integration and Continuous Deployment methodologies.
• Experience in microservice migration, including containerization and serverless architecture.
• Experience in building custom templates leveraging AWS CFT.
• Experience building pipelines with CI/CD tools.
• Experienced in architecting for high availability and cost-effective scalability in AWS.
• Knowledge in wide variety of open-source technologies and cloud services.
• Experience working in Linux/Unix and Windows administration.
• Hands on experience in following the iterative and agile SDLC.
• Experience with system capacity and planning, as well as functional configuration and audit.
• Must be available for occasional weekend or non-business hour activity involvement related to Infrastructure release, maintenance and patching as needed on a rotational basis.
#J-18808-Ljbffr
Key Job Functions
• Design automation implementation strategy for business and IT processes using the latest AWS technologies.
• Design, implement and maintain robust Infrastructure pipelines and deployment process for machine learning models and AI applications, ensuring reliability and efficient resource utilization.
• Build and maintain a robust infrastructure platform to support and enhance business processes and engineering capabilities through self-service deployment capabilities.
• Assist in automating Infrastructure as Code in Amazon Web Services with CloudFormation or Terraform.
• Build CI/CD pipeline utilizing AWS suite of services to orchestrate provisioning and deployment of both large- and small-scale systems.
• Write unit tests and integration tests to validate infrastructure code.
• Identify engineering defects in the existing code base and constantly improve the code quality.
• Establish and mature standards for observability, monitoring, and troubleshooting to detect and respond to production issues effectively and proactively.
• Automate routine operational tasks to improve the efficiency of ML Ops processes and reduce manual intervention.
• Support incident response processes, including troubleshooting and root cause analysis of system failures and performance degradation.
• Define and document best practices and strategies regarding application development and deployment activities.
Qualifications:
Education
• Bachelor’s degree in Computer Science, MIS or related technical field required.
Minimum Experience
• Minimum 3-5 years of Infrastructure and DevOps related engineering experience
Specialized Knowledge & Skills
• Proficiency in at least one high-level scripting (PowerShell, Python, Bash, etc.).
• Legally authorized to work in the US or is a US . • Working knowledge of Continuous Integration and Continuous Deployment methodologies.
• Experience in microservice migration, including containerization and serverless architecture.
• Experience in building custom templates leveraging AWS CFT.
• Experience building pipelines with CI/CD tools.
• Experienced in architecting for high availability and cost-effective scalability in AWS.
• Knowledge in wide variety of open-source technologies and cloud services.
• Experience working in Linux/Unix and Windows administration.
• Hands on experience in following the iterative and agile SDLC.
• Experience with system capacity and planning, as well as functional configuration and audit.
• Must be available for occasional weekend or non-business hour activity involvement related to Infrastructure release, maintenance and patching as needed on a rotational basis.
#J-18808-Ljbffr