Logo
IBM

Software Engineering Site Reliability Engineer II Professional Bellevue, US

IBM, Bellevue, Washington, us, 98009


Overview

In this role, you will be part of a team that develops and supports the Apptio Kubernetes Platform (AKP) where all Apptio applications are deployed. In a typical day you will interact with Github, Linux, Kubernetes, ArgoCD, Docker, Confluence, Jira, Slack, and AWS.

You Are

You are passionate about problem solving and reliability and have experience in SRE or an adjacent role. Your team can count on you to solve challenging problems across the entire Apptio Portfolio. You collaborate with other SREs, developers, and support teams to help provide value to the broader organization. You take responsibility when fixing problems in an automated code first way and are happy to step outside your comfort zone to develop your skillset.

You Aren’t

A Kubernetes or cloud expert with many years of experience. This is an intermediate position; we want you to help us, and we also want to help you grow.

Us

The Platform and Site Reliability Engineering team - PRE - at Apptio is responsible for enhancing and maintaining our Kubernetes platform and driving the adoption of SRE best practices across our engineering teams. We are a distributed team working across three locations including the United States, Poland, and Australia.

Your Role and Responsibilities

Manage deployments of Apptio services to AKPStreamline the deployment processImprove observability of the services within your purview by reviewing KPI dashboards and alertingAuthor and maintain documentation of deployment and monitoring processesUse runbooks to troubleshoot and triage production issuesDetect issues and handle Tier 1-2 troubleshootingParticipate in online “swarm” collaboration sessionsCollaborate with service developersParticipate in on-call rotationPerform maintenance of the platform (patching, resets, upgrades, etc.)

Required Technical and Professional Expertise

1+ years’ experience in an SRE or adjacent roleFoundational understanding of at least one programming language and source control (Preferably Golang)Practical experience with distributed application deployment and managementPractical experience with container technologies (e.g., Kubernetes, Docker)Practical experience with Infrastructure-as-code (IaC) – Terraform, Cloud Formation, Ansible, etc.Experience with cloud provider services such as AWS, Azure, or Google Cloud PlatformFamiliarity with RESTful systems and their APIsDemonstrated fluency with the English language

Preferred Technical and Professional Expertise

2+ years’ experience in an SRE or adjacent roleFamiliarity with Apptio and IBM product offerings

#J-18808-Ljbffr