Cloudways

Principal Software Engineer, Architecture (AI/ML)

Cloudways, Denver, Colorado, United States, 80285

Do you ever wonder what happens inside the cloud?

DigitalOcean (NYSE: DOCN) simplifies cloud computing so builders can spend more time creating software that changes the world. With our mission-critical infrastructure and fully managed offerings, DigitalOcean enables startups and small and medium-sized businesses (SMBs) to rapidly deploy and scale modern applications. As a remote-first organization, our employees, like our customers, are based around the world.

We want people who are passionate about staying on top of the latest cloud infrastructure and AI/ML trends, with an excellent aptitude for supporting internal employees and teams.

We are looking for a highly experienced, highly motivated Principal Software Engineer, Architecture (AI/ML) with a Computer Science, Engineering, or AI/ML background. You will be involved in the architecture, design, implementation, verification, and integration of the next generation of DigitalOcean Cloud Computing software with a strong emphasis on AI/ML-driven solutions.

What You’ll Be Doing:

Working at the forefront of cloud, distributed computing, and AI/ML technologies.

Serving as the architect driving the technical strategy and direction for our large-scale cloud services, including machine learning model deployment and orchestration.

Developing AI/ML models to optimize cloud infrastructure, improve system reliability, and enhance user experience.

Building and refining machine learning pipelines and frameworks to support scalable AI/ML solutions.

Owning the primary responsibility for establishing a pragmatic long-term technical direction for our software services, ensuring alignment with our customers, business goals, and internal teams.

Leading a team of highly passionate technical leads to evolve our service architecture, with alignment across several product technical roadmaps.

Leading by example through direct contribution and providing direction in establishing development and operational practices, with specific attention to AI/ML model lifecycle management.

Serving as the technical lead on our most demanding, cross-functional projects.

Actively mentoring individuals and the engineering community on advanced technical issues, including best practices in AI/ML.

What We’ll Expect From You:

Architect-level experience in the following domains:

Proven expertise in large-scale cloud and AI/ML services, and a deep understanding of cloud computing’s potential in enhancing AI/ML applications.

Demonstrated ability to lead and mentor large software and AI/ML teams.

Experience with web and cloud-native services is a must-have, with experience deploying scalable AI/ML solutions in production.

Adept at

Systems Thinking

with an ability to decompose complex problems into simple, straight-forward solutions, including AI/ML-specific challenges like model drift and data dependency management.

Strong grasp of system interdependencies, limitations, and expertise in AI/ML optimization techniques for performance, scalability, and accuracy.

AI/ML Expertise:

Hands-on experience in AI/ML frameworks and libraries, such as TensorFlow, PyTorch, or Scikit-Learn, and model-serving frameworks such as TensorFlow Serving or ONNX.

Proven experience in developing and deploying models for performance-intensive applications at web-scale.

Understanding of the MLOps lifecycle, including data engineering, model training, validation, deployment, and monitoring.

Understanding of key HPC technologies including RDMA, InfiniBand/RoCE, GPUDirect and other storage technologies.

Knowledge in performance, scalability, enterprise system architecture, and engineering best practices with an emphasis on the integration of AI/ML.

Leverage knowledge of open-source, industry standards, and prior art in architecture decisions with AI/ML considerations.

Balance technical leadership and savvy with strong business judgment to make the right decisions about technology, demonstrating simplicity and creativity.

Master’s degree or higher preferred in Computer Science, AI/ML, or a related field.

15+ years professional experience in web-scale system software development.

5+ years experience demonstrating an established track record in Deep Learning and Machine Learning.

3+ years recent experience as an ML engineer, data science engineer, or similar.

In-depth experience in two or more of the following areas: Cloud Computing, Storage, Networking, Platform-as-a-Service, Infrastructure-as-a-Service, Software-as-a-Service.

Excellent communication skills at all levels.

Why You’ll Like Working for DigitalOcean:

We are proud to work here.

You’ll be a part of a cutting-edge technology company with an upward trajectory, who are proud to simplify cloud computing so builders can spend more time creating software that changes the world.

We prioritize career development.

At DO, you’ll do the best work of your career. You will work with some of the smartest and most interesting people in the industry. We are a high-performance organization that will always challenge you to think big.

We care about your well-being.

Regardless of your location, we will provide you with a competitive array of benefits to support your overall well-being, from one-time work from home stipend to wellness allowance to flexible time off policy, to name a few.

We reward our employees.

The salary range for this position is between $225,000.00 - $338,000.00 based on market data, relevant years of experience, and skills.

We value diversity and inclusion.

We are an equal-opportunity employer, and recognize that diversity of thought and background builds stronger teams and products to serve our customers.

*This is a remote role

#LI-Remote

#LI-KR1

#J-18808-Ljbffr