Staff Cloud DevOps/Site Reliability Engineer
Inworld AI, Mountain View, CA, United States
Why Join Inworld
Inworld is the best-funded startup in AI and games with a $500 million valuation and backing from top tier investors including Intel Capital, Microsoft’s M12 fund, Lightspeed Venture Partners, Section 32, BITKRAFT Ventures, Kleiner Perkins, Founders Fund, and First Spark Ventures.
Inworld is the leading AI engine for games and interactive media. Inworld’s suite of AI components enables developers to build interactive, responsive, and personalized AI gaming experiences, orchestrate models to create intelligent game behaviors, and unlock enhanced productivity with AI-generated content. Inworld powers experiences built by Ubisoft, NVIDIA, Niantic, NetEase Games and LG, among others, and has partnerships with key industry players such as Microsoft Xbox, Epic Games, and Unity.
Inworld was recognized by CB Insights as one of the 100 most promising AI companies in the world in 2024 and was also named among LinkedIn's Top Startups of 2024 in the USA.
Our Technical Operations team manages the infrastructure, DevOps, and Site Reliability of our platform. We are looking for a Staff Cloud DevOps/Site Reliability Engineer to join our team.
Qualifications
- Bachelor's degree in Computer Science, Engineering, or a related field
- 7+ years of experience as a DevOps, Infrastructure, Operations, or Site Reliability Engineer (or as a software engineer with relevant experience)
Experience with at least 2 years in:
- Terraform
- Helm
- Kubernetes
- AWS, Azure, or GCP
- CI/CD using modern tools (GitOps)
Nice-to-Have:
- MLOps (building, orchestrating, and maintaining Machine Learning Pipelines)
- Prometheus / Grafana
- Multi-cloud deployments (2 or more)
- ArgoCD
- Network management and VPNs
Responsibilities
- Infrastructure: Maintain and contribute to Infrastructure-as-Code (Terraform)
- DevOps and CI/CD Pipelines: Orchestrate pipelines using Github Actions, Helm, ArgoCD
- Microservices scalability: Kubernetes Administration
- Cloud Administration
- Site Reliability: Measure and monitor availability, latency, and overall service health, drive incident management and post-mortem analysis
In-office location: Mountain View, CA, United States.
Remote location: United States.
The US base salary range for this full-time position is $180,000 - $280,000. In addition to base pay, total compensation includes equity and benefits. Within the range, individual pay is determined by work location, level, and additional factors, including competencies, experience, and business needs. The base pay range is subject to change and may be modified in the future.