Site Reliability Engineer NOT open for C2C or sponsorship
Vaco, Seattle, WA, United States
In this role as an Associate Site Reliability Engineer, you will be an integral member of a dynamic SRE/DevOps team continuously improving our AWS cloud deployment platform, "automation first".
Responsibilities
- Drive team initiatives to continuously refine AWS deployment practices for improved reliability, repeatability and security.
- Work closely with the development teams to automate deployment and configuration of infrastructure.
- Design effective monitoring / alerting (for conditions such as application-errors, high memory usage) and log aggregation approaches (to quickly access logs for troubleshooting, or generate reports for trend analysis) to proactively notify business stakeholders of issues and communicate metrics, working closely with these stakeholders
- Write code and scripts to automate provisioning of AWS services and to configure services, using tools and languages including AWS CLI / API, Terraform, Ansible, Python, Bash
- Configure build pipelines to support automated testing and deployments using tools including Jenkins, CircleCI, GitHub Actions
- Help refine DevSecOps security practices (including regular security patching, minimum-permissions accounts and policies, encrypt-everything) in compliance with Health IT, government and other standards regulations, implement, and verify them, using tools like the AWS security stack (GuardDuty, Systems Manager, Config),, VeraCode, SonarQube, etc. to analyze and verify compliance.
- Document and diagram deployment-specific aspects of architectures and environments, working closely with Software Engineers, Software Engineers in Test, and others in DevOps.
- Troubleshoot issues in production and other environments, applying debugging and problem-solving techniques (e.g., log analysis, non-invasive tests) , working closely with development and product teams.
- 2+ years Cloud administration experience (AWS, Azure, GCP) OR 2+ years software engineering experience in a modern, high-level language (Ruby, Java, Python, etc.)
- Strong experience developing and / or deploying Docker Containers on Kubernetes (Helm, Kustomize, etc)
- Working knowledge of IAC / configuration management tools such as Terraform, Ansible or Puppet.
- Recent experience with setup, configuration and monitoring of RDBMS and NoSQL datastores
- A strong understanding of Linux administration including Bash scripting
- Experience in automation using Go or Python
- Experience with log aggregation tools such as Datadog, ELK, Splunk
- Bachelor's degree in science, technology, engineering or similar field is desired.
- Experience in HIPAA/SOC 2 environments
Job Summary
In this role as an Associate Site Reliability Engineer, you will be an integral member of a dynamic SRE/DevOps team continuously improving our AWS cloud deployment platform, "automation first".
Responsibilities
Drive team initiatives to continuously refine AWS deployment practices for improved reliability, repeatability and security.
Work closely with the development teams to automate deployment and configuration of infrastructure.
Design effective monitoring / alerting (for conditions such as application-errors, high memory usage) and log aggregation approaches (to quickly access logs for troubleshooting, or generate reports for trend analysis) to proactively notify business stakeholders of issues and communicate metrics, working closely with these stakeholders
Write code and scripts to automate provisioning of AWS services and to configure services, using tools and languages including AWS CLI / API, Terraform, Ansible, Python, Bash
Configure build pipelines to support automated testing and deployments using tools including Jenkins, CircleCI, GitHub Actions
Help refine DevSecOps security practices (including regular security patching, minimum-permissions accounts and policies, encrypt-everything) in compliance with Health IT, government and other standards regulations, implement, and verify them, using tools like the AWS security stack (GuardDuty, Systems Manager, Config),, VeraCode, SonarQube, etc. to analyze and verify compliance.
Document and diagram deployment-specific aspects of architectures and environments, working closely with Software Engineers, Software Engineers in Test, and others in DevOps.
Troubleshoot issues in production and other environments, applying debugging and problem-solving techniques (e.g., log analysis, non-invasive tests) , working closely with development and product teams.
Qualifications
2+ years Cloud administration experience (AWS, Azure, GCP) OR 2+ years software engineering experience in a modern, high-level language (Ruby, Java, Python, etc.)
Strong experience developing and / or deploying Docker Containers on Kubernetes (Helm, Kustomize, etc)
Working knowledge of IAC / configuration management tools such as Terraform, Ansible or Puppet.
Recent experience with setup, configuration and monitoring of RDBMS and NoSQL datastores
A strong understanding of Linux administration including Bash scripting
Experience in automation using Go or Python
Experience with log aggregation tools such as Datadog, ELK, Splunk
Preferred Qualifications
Bachelor's degree in science, technology, engineering or similar field is desired.
Experience in HIPAA/SOC 2 environments