Applied Intuition Inc.

ML Runtime Optimization Engineer

Applied Intuition Inc., Mountain View, California, us, 94039

About the roleWe are looking for a software engineer with expertise in optimizing ML models and deploying them on production-grade runtime environments and chips. You’ll work across the entire ML framework/compiler stack (e.g. PyTorch, JAX, ONNX, TensorRT, CUDA, XLA, Triton).

At Applied Intuition, you will:

Build the optimization pipeline for deploying ML models to real-world hardware.

Build foundational libraries for analyzing and optimizing model performance, correctness, numerical stability, and cross-platform reproducibility.

Closely collaborate with ML developers on model architecture details to reduce compiled latency and resource usage.

Learn about the variety of production-grade boards our customers use and develop computational resource strategies for different customer needs.

We're looking for someone who has:

B.Sc in Computer Science, Mathematics or a related field

Knowledge and experience with ML accelerators, GPU, CPU, SoC architecture and micro-architecture.

Proficiency in C++, strong software development skills with the focus on high-performance computing.

Working experience with Python.

Experience in developing on or using deep learning frameworks (e.g., PyTorch, JAX, ONNX, etc.)

Nice to have:

M.Sc or PhD in a ML related area

Built a ML compiler or optimization framework from scratch before

Deployed ML solutions to embedded chips for real time robotics applications

The salary range for this position is $125,000 - $222,000 USD annually. This salary range is an estimate, and the actual salary may vary based on the Company's compensation practices.

#J-18808-Ljbffr