Senior Site Reliability Engineer
ERPMARK INC, Mountain View, CA, United States
Senior SRE Engineer
Mountainview CA – Onsite Hybrid
Full time
• Design, implement, and maintain complex data systems supporting millions of customers with Cloud Native principles and best practices to ensure highly available, secure, performant and scalable database systems
• Build and maintain CI/CD pipelines in Jenkins
• Build and deploy services in Kubernetes cluster using helm, kustomize, etc
• Contribute to infrastructure changes to AWS with deep understanding of AWS services
• Engage in on-call for pre-production and production systems supporting multi-million users
• Write/Review RCA docs to prevent recurrence of Incidents in future and share the learnings
• Contribute to major system upgrades, deployment automation, monitoring enhancements and Production changes
• Create operational playbooks, contribute to how-to articles, and gain domain knowledge to drive changes in the team
• Participate and contribute in FMEA/Chaos testing, Security remediations, etc
• Share best practices and patterns for operational excellence and cost optimization
• Reduce or eliminate manual steps by automating as much as possible
• Continuously look for opportunities to increase developer velocity and productivity
Qualifications:
• Bachelor’s or master’s degree in computer science or a related technical field. Equivalent experience will be considered
• 4+ years of hands-on development & operational experience with building and maintaining infrastructure in AWS
• Extensive performance monitoring, troubleshooting & tuning experience
• Experience with AWS services and hands-on knowledge of hosting on Cloud
• Experience with scripting languages for DevOps automation
• Experience with any one of the programming languages: Java/Python/Ruby
• Knowledge of Docker & Kubernetes, ArgoCD,
• Experience with monitoring and observability using Splunk, Wavefront, AppDynamics, Prometheus, Tracing, etc