Inabia Software & Consulting Inc.

Site Reliability Engineer (Fintech)

Inabia Software & Consulting Inc., San Francisco, CA, United States

Client looking for 10+ years of experience.

Title: Site Reliability Engineer (Fintech)
Duration: Contract
Location: Bellevue, WA, Frisco, TX, Atlanta, GA, Overland Park, KS (Hybrid)

Locals Required
Except OPT and H1T, All Work Authorization is Workable.
While sharing the resume, please do mention the candidate location and Work Authorization.

Job Description:

We are looking for an experienced Platform Site Reliability Engineer (SRE) with deep expertise in Kubernetes and AWS to help us enhance the performance, scalability, and reliability of our Digital Payment platform. You will play a critical role in ensuring the availability and resilience of our cloud-native services, with a focus on automation, monitoring, and performance optimization.

Key Responsibilities:

Kubernetes Management: Deploy, manage, and optimize Kubernetes clusters in production and staging environments, ensuring high availability and efficient resource utilization.
AWS Infrastructure: Leverage AWS cloud services (EC2, S3, RDS, EKS, Lambda, etc.) to build, manage, and scale cloud-native infrastructure.
Automation & Infrastructure as Code: Develop and maintain automated workflows using Infrastructure as Code (IaC) tools like Terraform, CloudFormation, or Ansible.
CI/CD Pipeline Support: Build, optimize, and maintain CI/CD pipelines using tools like Jenkins, GitLab CI, or CircleCI.
Monitoring & Observability: Implement and maintain monitoring, alerting, and logging solutions using tools such as Prometheus, Grafana, CloudWatch, or ELK stack.
Incident Response: Lead and support incident response efforts, conduct root cause analysis, and implement post-incident reviews.
Performance Optimization: Identify and resolve performance bottlenecks, improve system efficiency, and ensure applications and infrastructure are optimized.
Security & Compliance: Work with security teams to implement best practices for securing Kubernetes clusters and AWS resources.
Collaboration & Documentation: Work closely with development, DevOps, and infrastructure teams to align on best practices and document procedures.

Required Qualifications:

Experience: 3+ years of experience as a Site Reliability Engineer, DevOps Engineer, or in a similar role.
Kubernetes Expertise: Strong expertise in managing and scaling Kubernetes clusters.
AWS Cloud Expertise: Proficiency with AWS services such as EC2, S3, EKS, RDS, and others.
Infrastructure as Code (IaC): Hands-on experience with IaC tools such as Terraform or AWS CloudFormation.
CI/CD Pipelines: Experience building and maintaining CI/CD pipelines using Jenkins, GitLab CI, or similar tools.
Scripting & Automation: Proficiency in scripting languages such as Python, Bash, or Go.
Monitoring & Logging: Experience with monitoring, logging, and alerting tools.
Troubleshooting & Incident Management: Ability to troubleshoot complex issues in distributed systems.
Collaboration Skills: Strong communication skills with the ability to work collaboratively.

#J-18808-Ljbffr