Logo
Databricks

GenAI Staff Machine Learning Engineer, Performance Optimization

Databricks, San Francisco, California, United States, 94199


P-984

Founded in late 2020 by a small group of machine learning researchers, Mosaic AI enables companies to create state-of-the-art AI models from scratch on their own data. From a business perspective, Mosaic AI is committed to the belief that a company’s AI models are just as valuable as any other core IP, and that high-quality AI models should be available to all. From a scientific perspective, Mosaic AI is committed to reducing the cost of training state-of-the-art models - and sharing our knowledge about how to do so with the world - to allow everyone to innovate and create models of their own.

Now part of Databricks since July 2023 as the GenAI Team, we are passionate about enabling our customers to solve the world's toughest problems by building and running the world's best data and AI platform. We leap at every opportunity to solve technical challenges, striving to empower our customers with the best data and AI capabilities.

You will:

Explore and analyze performance bottlenecks in ML training and inference

Design, implement and benchmark libraries and methods to overcome aforementioned bottlenecks

Build tools for performance profiling, analysis, and estimation for ML training and inference

Balance the tradeoff between performance and usability for our customers

Facilitate our community through documentation, talks, tutorials, and collaborations

Collaborate with external researchers and leading AI companies on various efficiency methods

We look for:

Hands on experience the internals of deep learning frameworks (e.g. PyTorch, TensorFlow) and deep learning models

Experience with high-performance linear algebra libraries such as cuDNN, CUTLASS, Eigen, MKL, etc.

General experience with the training and deployment of ML models

Experience with compiler technologies relevant to machine learning

Experience with distributed systems development or distributed ML workloads

Hands on experience with writing CUDA code and knowledge of GPU internals (Preferred)

Publications in top tier ML or System Conferences such as MLSys, ICML, ICLR, KDD, NeurIPS (Preferred)

We value candidates who are curious about all parts of the company's success and are willing to learn new technologies along the way.

#J-18808-Ljbffr