Anthropic Limited

Performance Engineer

Anthropic Limited, San Francisco, California, United States, 94199

Running machine learning (ML) algorithms at our scale often requires solving novel systems problems. As a Performance Engineer, you'll be responsible for identifying these problems, and then developing systems that optimize the throughput and robustness of our largest distributed systems. Strong candidates here will have a track record of solving large-scale systems problems and will be excited to grow to become an expert in ML also.You may be a good fit if you:

Have significant software engineering or machine learning experience, particularly at supercomputing scale

Are results-oriented, with a bias towards flexibility and impact

Pick up slack, even if it goes outside your job description

Enjoy pair programming (we love to pair!)

Want to learn more about machine learning research

Care about the societal impacts of your work

Strong candidates may also have experience with:

High performance, large-scale ML systems

GPU/Accelerator programming

ML framework internals

OS internals

Language modeling with transformers

Representative projects:

Implement low-latency high-throughput sampling for large language models

Implement GPU kernels to adapt our models to low-precision inference

Write a custom load-balancing algorithm to optimize serving efficiency

Build quantitative models of system performance

Design and implement a fault-tolerant distributed system running with a complex network topology

Debug kernel-level network latency spikes in a containerized environment

Deadline to apply: None. Applications will be reviewed on a rolling basis.

#J-18808-Ljbffr