Divisions Maintenance Group

Software Development Engineer - Infrastructure II

Divisions Maintenance Group, Cincinnati, Ohio, United States, 45208

Description

Job Description:

As a Infra Reliability Engineer 2, you will play a key role in ensuring the availability and performance of our infrastructure and applications. You will also be responsible for incident response and triage, ensuring a swift and effective response to security incidents and operational disruptions.

Key Responsibilities:

Kubernetes and Container Orchestration: Maintain and optimize our Kubernetes-based infrastructure and Docker containers to ensure high availability and scalability.Cloud Infrastructure: Work with AWS services to design, implement, and manage scalable and resilient cloud infrastructure.CI/CD Pipelines: Manage and enhance CI/CD pipelines using tools like ArgoCD, Argo Workflows, and Helm for efficient software delivery.Scripting and Automation: Develop and maintain automation scripts using Python, Shell, or other scripting languages to streamline operational tasks.Incident Response And Incident Triage: Lead incident response efforts, including detection, containment, eradication, recovery, and post-incident analysis to ensure the security and integrity of our systems. Conduct initial triage of security incidents and operational disruptions to assess severity, gather information, and prioritize actions effectively.Configuration Management: Utilize configuration management tools such as Ansible and Chef to ensure consistent and reliable system configurations.Observability: Implement and maintain observability solutions using Prometheus, Grafana, Datadog, or similar tools to monitor and troubleshoot system performance.Collaboration: Collaborate closely with development teams to identify and resolve issues, improve application performance, and enhance system reliability.Documentation: Create and maintain comprehensive documentation for processes, configurations, and best practices.Qualifications:

Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent work experience).Minimum 5 years of experience in Site Reliability Engineering or DevOps role.Strong experience with Kubernetes and Docker containerization.Proficiency in AWS services and cloud infrastructure management.Hands-on experience with CI/CD platforms like ArgoCD, Argo Workflows, Spinnaker, and Helm.Proficiency in scripting languages such as Python and Shell.Knowledge of configuration management tools, including Ansible and Chef.Familiarity with observability platforms like Prometheus, Grafana, and Datadog.Excellent troubleshooting skills and the ability to work collaboratively with cross-functional teams.Experience in incident response and triage, with the ability to assess, contain, and remediate security incidents.Strong communication skills and the ability to document processes effectively.Good To Have:

Development experience in one of the languages C# or Java.Database knowledge in the following databases: PostgreSQL, MongoDB, or MySQL.Familiarity with SecOps tools such as WAF (Web Application Firewall), Trusted Advisor, or similar security tools.Experience in any Internal Developer Platforms such as Backstage.Experience in Java Script.

#J-18808-Ljbffr