Stack AV

Staff Software Engineer, ML Platform

Stack AV, San Francisco, California, United States, 94199

What We're Looking for:

We are looking for people who are passionate about delivering self-driving (L4) products that make the way we move safer, faster, and more efficient. We seek mission-driven, highly skilled people with deep experience in fast-paced, rapidly growing, tech development environments.

About the Team:

In Stack’s Machine Learning Platform’s ML Data team, we work on three areas: labeling, training dataset, and data mining. Labeling works with our vendor and automates the need for getting our proprietary dataset annotated and ready for training. Training dataset has two goals: build the pipelines to create the dataset used for training our models and build the infrastructure to serve datasets in high throughput. Data mining implements the infrastructure and the techniques to find

interesting

events on the truck and offboard while enabling the platform team and MLE teams to create hundreds of dimensions to explain our dataset slicing.

For training, you would help us build state of the art infrastructure to support machine learning training specific read and write access patterns. This would involve OSS components such as Ray, Spark, and Iceberg.

For mining, you would be building a high throughput inference service using LLMs and vector db. You would then help us explore in-context learning and fine-tuning for making more out of the models.

For labeling, you would set the direction and build towards auto-labeling. You would be the owner driving labeling needs of the entire company.

What Success Looks Like:

Experience with both ML Platforms and building ML-based applications (bonus point if you have modeling experience).

Experience building scalable, reliable infra at a fast-paced environment.

Ability to work across teams.

Experience building or using ML infra built for a large number of customer teams.

A deep understanding of design tradeoffs and ability to articulate those tradeoffs and work with others on getting alignment.

Experience with building ML models or ML infra in the domains of autonomous vehicles, perception, and decision making (desirable but not required).

Experience with model training, model optimization, or large data processing pipelines.

We are especially interested in individuals who check at least two of these boxes:

Knows how to push the GPU to its limit from Python to CUDA kernel level.

Built the inference or training loop for a large model (ideally with LLM flavor).

Shipped ML products (NLP, computer vision, recommender systems, etc.) at scale to make business impact.

Have data platform experience where you built infrastructure for real time querying / vector databases, batch/stream processing using Ray, Spark or similar, and Parquet-based object storage (data lake / data warehouse).

Knows how to build low latency / high throughput batch or stream processing pipelines.

Knows how to write (readable) high performance C++.

Has prior AV experience.

#J-18808-Ljbffr