Credence Management Solutions, LLC
Site Reliability Engineer
Credence Management Solutions, LLC, Mc Lean, VA
Overview
Credence Management Solutions, LLC (Credence) is seeking a Site Reliability Engineer to support a task order within GSA COMET II.
Responsibilities include, but are not limited to the duties listed below
Education, Requirements and Qualifications
Credence Management Solutions, LLC (Credence) is seeking a Site Reliability Engineer to support a task order within GSA COMET II.
Responsibilities include, but are not limited to the duties listed below
Education, Requirements and Qualifications
- Bachelor's/Masters degree in computer science or other highly technical, scientific discipline
- Ability to program (structured and OO) with one or more high level languages, such as Python, Java, C/C++, Ruby, and JavaScript
- Experience with cloud storage technologies as well as dynamic resource management frameworks (Mesos, Kubernetes, Yarn)
- A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
- 5+ years of experience with Cloud Architecture, preferably AWS
- 10+ years of experience with Operations of enterprise systems with over million users
- 10+ years of experience with application development
- 5+ years of experience in DevSecOps
- 3+ years of experience with microservices
- 5+ years of experience leading teams
- 3+ years of experience with agile Role & Responsibilities
- Run the production environment by monitoring availability and taking a holistic view of system health
- Build software and systems to manage/operate platform infrastructure and applications
- Improve reliability, quality, and time-to-market of our suite of software solutions
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
- Provide primary operational support and engineering for multiple large distributed software applications
- Ensure Production readiness for releases which includes Performance/Usability Testing
- Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
- Partner with development teams to improve services through rigorous testing and release procedures
- Participate in system design consulting, platform management, and capacity planning
- Create sustainable systems and services through automation and uplifts
- Balance feature development speed and reliability with well-defined service level objectives
- Production incidents RCAs and Conducting post-incident reviews
- Optimizing on-call rotations and processes
- Constant upkeep of documentation and runbooks