Machine Learning Engineer, Platform & Data
Adobe Inc., San Jose, CA, United States
Our Company
Changing the world through digital experiences is what Adobe's all about. We give everyone—from emerging artists to global brands—everything they need to design and deliver exceptional digital experiences! We're passionate about empowering people to create beautiful and powerful images, videos, and apps, and transform how companies interact with customers across every screen.
We're on a mission to hire the very best and are committed to creating exceptional employee experiences where everyone is respected and has access to equal opportunity. We realize that new ideas can come from everywhere in the organization, and we know the next big idea could be yours!
The Opportunity
Firefly is the new family of creative generative AI models coming to Adobe products that offers a new way to conceptualize, build, and scale content. It's a natural extension of the technology Adobe has produced over the past 40 years.
At the core of Firefly are our commercially safe AI models trained on hundreds of millions of images owned or licensed by Adobe. We are hiring for a highly strategic and visible role to help evolve these models. This is an opportunity to reach millions of creatives, helping them reinvent the way they work.
Responsibilities:
- Design, architect and build cloud ML platform solutions related but not limited to resource management, monitoring, allocation, and job scheduling.
- Design, architect and build reliability, observability and utilization infrastructure for cloud computing resources.
- Collaborate with data platform engineers and architects to seamlessly integrate low latency data pipelines into the ML platform for model training.
- Collaborate with machine learning and data scientists to identify and resolve requirements in order to improve the training cost and turnaround time on the ML platform.
- Monitor machine learning platform performance and modify infrastructure to fit fluid cloud resource needs.
- Write high quality, product level code that is easy to maintain and test following standard methodologies.
Key skill requirements:
- Proficiency in at least two of: Linux, Ansible, Docker, Kubernetes (3+ yrs)
- Expert in Python and/or C++
- Experience in distributed computing (3+ yrs)
- Experience in HDFS, Spark, Presto (3+ yrs)
- Experience working with AWS or similar cloud infrastructure (3+ yrs)
- Experience with HW resource management for ML training and/or deployment
- B.S., M.S, or Ph.D. in Computer Science, Computer Engineering or a related area