Logo
Waymo

Software Engineer, Foundation Model Data Infra

Waymo, Mountain View, California, us, 94039


Waymo is an autonomous driving technology company with the mission to be the most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the Waymo Driver - The World's Most Experienced Driver - to improve access to mobility while saving thousands of lives now lost to traffic crashes. The Waymo Driver powers Waymo One, a fully autonomous ride-hailing service, and can also be applied to a range of vehicle platforms and product use cases. The Waymo Driver has provided over one million rider-only trips, enabled by its experience autonomously driving tens of millions of miles on public roads and tens of billions in simulation across 13+ U.S. states.The Perception team at Waymo is at the heart of our Autonomous Vehicles with a mission to perceive and represent everything from pedestrians to vehicles to long-tail phenomena. Perception Data Primary is the core function that owns the infrastructure and tools to understand vast amounts of Waymo driving data, curate datasets, ensure data availability and quality, and accelerate the development of Perception and the Waymo driver. This spans areas from performance onboard real-time systems to scalable systems that deliver data to power Perception's machine learning (ML) models, diverse ML task evaluation, foundational model development, and commercialization.In this hybrid role, you will report to a Software Engineering Manager.You will:

Develop infrastructure and frameworks for generating reliable, high-quality data for Perception foundation model development at the tens of petabyte scale.Deploy and maintain end-to-end data pipelines that deliver the foundation model data.Curate and deliver data sets, for both nominal and rare instances, that support the diverse ML tasks for Perception foundation models.Collaborate with foundation model, Perception evaluation, and ML infrastructure teams to integrate data generation into model training and evaluation frameworks.Collaborate with production teams to deploy foundation models in deployment settings.Responsible for infrastructure reliability, data availability, and data quality.At a minimum we'd like you to have:

Outstanding programming skills in C++, or Python.Hands-on experience building large-scale data processing and retrieval systems and pipelines: Apache Spark, Apache Beam, Google Cloud Dataflow, AWS Data Pipeline, or Faiss.3+ years of experience in ML data engineering, including data pipelines, data curation, and data balancing.3+ years of experience working in integrated settings to support data users and collaborating with infrastructure partners; customer-oriented mindset.MS and 1+ years of industrial experience, or equivalent.

#J-18808-Ljbffr