Site Reliability Engineer
Onebrief, Inc., Honolulu, HI, United States
About us
Onebrief is a revolutionary platform for military staff workflows and operational planning. The software is designed to enable smarter, real-time decisions. With unparalleled collaboration features, AI-enhanced tools, and customizable workflows, Onebrief makes staffs superhuman. The expanding roster of customers includes COCOMs and Service Components worldwide.
Founded in 2017 by a group of experienced planners, today, Onebrief’s workforce of 120+ spans veterans from all forces and global organizations, and technologists from leading-edge software giants. Onebrief’s growth is exemplary, having raised $53M+ and counting from leading venture investors.
What you will achieve
You’ll be the first line of support for our mission critical deployments, and responsible for ensuring best in class service quality and issue resolution. You’ll work in both on-premise DoD environments and AWS cloud environments. Your lessons from the field will shape how our team works, from policy to implementation.
In addition to working at the customer, you’ll contribute directly to solutions that increase stability, performance, and security of our deployments, and improve the overall experience of deploying and managing Onebrief on premise.
This role sits in our Technology Operations organization.
Core skills and technologies: VMWare, Kubernetes, Docker, Helm, Ansible, Terraform, Linux, AWS, DoD compliance
About You
This is an opportunity for candidates who are flexible to travel and work outside regular business hours with moderate frequency.
Qualifications
You are a trained and experienced (3+ years) SRE engineer, automating software delivery and deployment, and providing documentation and self-service tools to engineering teams and customers
You have an active security clearance, are familiar with the DoD IT environment, and have first-hand experience managing mission critical systems inside DoD’s air gapped networks.
You are experienced working in DoD on-premise environments and AWS cloud environments.
You understand Linux, containers, virtual machines, and Kubernetes, and know how to harden them in accordance with RMF security controls and STIGs/SRGs
You area proficient using VMWare, Docker, Helm, Ansible, and Terraform
You are at home at the command line
You have experience setting up backups, logging, and alerting at various layers of the OSI model to ensure SLAs meet customer satisfaction levels
You are experienced with system monitoring and can pro-actively identify future issues and/or needs for increased capacity
You have a strong understanding of incident response processes and how to conduct root cause analysis
You are creative and self-reliant, able to operate in air-gapped environments with limited tools and little external help
You work well with developers to ensure efficient and secure development and operations
You have a Security+ certification or another DoD 8570.01 approved security certificate.
Most importantly, you are a true Onebriefer:
You are obsessed with creating value for real users
You are ambitious, scrappy, and a creative problem-solver
You learn quickly, work iteratively, and naturally seek collaboration
You approach your work with integrity, intellectual honesty, and a low ego
You communicate frankly, clearly, and succinctly
You thrive as a self-starter, embracing autonomy and ambiguity
You are a U.S. citizen