WEX, Inc.

Site Reliability Engineer 1

WEX, Inc., Boston, Massachusetts, us, 02298

About the Team/RoleThe WEX Site Reliability Engineering (SRE) team is looking for a motivated and quick-learning Level 1 Site Reliability Engineer to join our growing team. We are passionate about developing software and solutions for observability, incident response, reliability, performance, operational excellence, and compliance. As a member of the SRE organization, you will support internal stakeholders and Engineering teams, tackling complex challenges and enhancing our engineering teams' and customers' experience. You will have the opportunity to work alongside experienced SREs and gain valuable hands-on experience in a dynamic and supportive environment.As a Level 1 SRE at WEX, you will:Learn the fundamentals of SRE: Gain a solid understanding of core SRE principles, including monitoring, incident management, and automation.Develop basic automation scripts: Use scripting languages like Python or Bash to automate simple tasks and improve operational efficiency.Triage and resolve incidents: Participate in on-call rotations, assisting with the identification and resolution of incidents under the guidance of senior SREs.Monitor system health: Utilize monitoring tools to identify and escalate potential issues, ensuring the stability and performance of our systems.Collaborate with development teams: Work closely with software engineers to understand their systems and provide operational support.Contribute to documentation: Help maintain and improve internal documentation, including runbooks, knowledge base articles, and playbooks.Continuously learn and grow: Expand your knowledge of cloud technologies, DevOps practices, and SRE tools through internal and external training opportunities.How you'll make an impactDevelop a basic understanding of code, networking, operating systems, and storage solutions: You'll be able to identify and troubleshoot common issues related to these areas.Assist in developing automation and utilizing monitoring tools to ensure system reliability: You'll learn how to use tools to automate tasks and monitor system health.Participate in incident response and troubleshooting alongside senior SREs: You'll gain experience in identifying, escalating, and resolving incidents.Participate in 24x7 Site Reliability rotations and escalation workflows with guidance from senior team members: You'll learn how to respond to incidents and escalate issues appropriately.Learn to identify and address basic performance bottlenecks: This will include understanding code optimization, configuration changes, and infrastructure upgrade recommendations.Collaborate with development teams to ensure software design meets operational requirements: You'll learn how to communicate effectively with developers and advocate for operational best practices.Work with development teams to make sure operational needs are met by assisting with support requests from other engineering teams: You'll gain experience in providing support and collaborating with different teams.Contribute to the continuous improvement of processes and procedures to increase system reliability and efficiency: You'll participate in team discussions and contribute ideas for improvement.Stay up-to-date with the latest industry trends and technologies: You'll be encouraged to learn new technologies and share your knowledge with the team.Experience you'll bringBasic understanding of at least one major programming language: C#, Java, GoLang, Python. You should be able to read and understand code, and write scripts.Familiarity with a Cloud Computing platform (AWS, Azure, or GCP): You should have a basic understanding of cloud concepts and services.Strong communication and collaboration skills: You'll be working closely with different teams, so effective communication is essential.BA/BS degree in Computer Science or related technical field or equivalent job experience: A strong foundation in computer science principles is important.Nice to haveBasic understanding of infrastructure as code, preferably Terraform: Familiarity with IaC concepts and tools is a plus.Working knowledge of RESTful APIs: Understanding how APIs work is beneficial.Exposure to observability and logging technologies: Any experience with monitoring and logging tools is helpful.Experience with at least one major RDBMS and NoSQL data store: Familiarity with databases is a plus.Exposure to containerization technologies such as Docker or Kubernetes: Basic knowledge of containers and orchestration is beneficial.Familiarity with GitOps: Understanding of GitOps principles is helpful.

#J-18808-Ljbffr