RIT Solutions, Inc.

DevOps Engineer

RIT Solutions, Inc., Atlanta, Georgia, United States, 30383

DevOps Engineer - Englewood, Colorado - Remote

Job Description

We are seeking a Mid-Level DevOps Engineer with Site Reliability Engineering (SRE) experience to contribute to the transition of Crew Management Applications to a web-based SaaS model hosted on AWS. The successful candidate will work under the guidance of a Senior DevOps Engineer, supporting critical system reliability, automation, and monitoring tasks while actively contributing to the successful implementation of key deliverables.

Required Skills

-DevOps, Site Reliability Engineering (SRE), Kubernetes, AWS EKS

Job Duties

Support Key Deliverables: Assist in implementing metrics collection, developing dashboards, conducting reliability audits, and creating runbooks as outlined in the project goals. - Collaboration: Work closely with the Senior DevOps Engineer, development teams, and support teams to ensure seamless operations and effective communication between stakeholders. - CI/CD and Automation: Contribute to the development and optimization of CI/CD pipelines and automation scripts to support efficient and consistent deployments. - Observability Implementation: Assist in configuring and maintaining monitoring solutions using OpenTelemetry and Grafana to enhance system visibility. - Production Support: Participate in 24/7 Tier II production support on a rotational basis, addressing technical escalations and contributing to system stability. - Documentation: Collaborate in the preparation of technical documentation, including runbooks, playbooks, and training materials for Tier I and II support teams. - Dashboards and Metrics: Support the development of Grafana dashboards for monitoring services, including Kubernetes platform components and internally developed services. - Issue Investigation: Assist in identifying and resolving issues reported from lower-tier support teams, ensuring timely resolution and minimizing customer impact. - Game Day Scenarios: Participate in the execution of Game Day scenarios to prepare for potential system failures and improve operational readiness. - Reliability Contributions: Work on tasks related to reliability audits, including submitting merge requests for simpler issues and escalating more complex problems to senior team members.

Job Requirements Experience: 3-5 years in DevOps, SRE, or related roles with a focus on cloud-hosted, microservices-based environments. - Technologies: Familiarity with Kubernetes, AWS EKS, Terraform, ArgoCD, OpenTelemetry, and Grafana. - DevOps Practices: Basic understanding of CI/CD pipelines and infrastructure-as-code (IaC) principles. - Incident Management: Experience in troubleshooting and resolving technical issues in production environments. - Collaboration: Ability to work effectively as part of a team under the direction of senior engineers. - Documentation: Basic skills in technical writing, including the ability to contribute to incident runbooks and operational playbooks. - On-Call Readiness: Willingness to participate in 24/7 rotational production support as required.

Desired Skills & Experience Exposure to GitOps practices and tools like GitLab. - Experience contributing to dashboards and monitoring systems for production environments. - Familiarity with automated remediation processes and system optimization practices. - Background in supporting SaaS environments or cloud migrations.