Logo
IBM

Software Engineering Senior Site Reliability Engineer Professional Bellevue, US

IBM, Bellevue, Washington, us, 98009


IntroductionOverviewIn this role, you will be part of a team that develops and supports the Apptio Kubernetes Platform (AKP) where all Apptio applications are deployed. In a typical day you will interact with Github, Linux, Kubernetes, ArgoCD, Docker, Confluence, Jira, Slack, and AWS.You AreYou are passionate about problem solving and reliability and have significant experience in SRE or an adjacent role. Your team can count on you to solve challenging problems across the entire Apptio Portfolio. You collaborate with other SREs, developers, and support teams to help provide value to the broader organization. You take responsibility when fixing problems in an automated code first way and are happy to step outside your comfort zone to develop your skillset. You are a mentor to other engineers and able to assist Management in key decision making.UsThe Platform and Site Reliability Engineering team - PRE - at Apptio is responsible for enhancing and maintaining our Kubernetes platform and driving the adoption of SRE best practices across our engineering teams. We are a distributed team working across three locations including the United States, Poland, and Australia.Your Role and ResponsibilitiesManage deployments of Apptio services to AKPStreamline the deployment processImprove observability of the services within your purview by reviewing KPI dashboards and alertingMentor junior to mid-level engineersAuthor and maintain documentation of deployment and monitoring processesWrite and use runbooks to troubleshoot and triage production issuesDetect issues and handle Tier 3 troubleshootingDrive online “swarm” collaboration sessionsCollaborate with service developersParticipate in on-call rotationPerform maintenance of the platform (patching, resets, upgrades, etc.)Operate independently and own end-to-end delivery of solutionsHave significant input in the product roadmap and be able to articulate effectively the benefits of alternative technologiesRequired Technical and Professional Expertise5+ years’ experience in an SRE or adjacent roleFunctional understanding of at least one programming language and source control (Preferably Golang)Expertise with distributed application deployment and management via KubernetesExpertise with container technologies (e.g., Kubernetes, Docker)Expertise with Infrastructure-as-code (IaC) concepts (Terraform)Expertise with cloud provider services, preferably AWSAbility to work with RESTful systems and their APIsFamiliarity with observability (e.g., Prometheus, Open telemetry)Demonstrated fluency with the English language skillsPreferred Technical and Professional Expertise7+ years’ experience in an SRE or adjacent role

#J-18808-Ljbffr