Inworld AI

Staff/Principal Machine Learning Engineer, Speech

Inworld AI, Mountain View, CA, United States

Why Join Inworld

Inworld is the best-funded startup in AI and games with a $500 million valuation and backing from top tier investors including Intel Capital, Microsoft’s M12 fund, Lightspeed Venture Partners, Section 32, BITKRAFT Ventures, Kleiner Perkins, Founders Fund, and First Spark Ventures.

Inworld is the leading AI engine for games and interactive media. Inworld’s suite of AI components enables developers to build interactive, responsive, and personalized AI gaming experiences, orchestrate models to create intelligent game behaviors, and unlock enhanced productivity with AI-generated content. Inworld powers experiences built by Ubisoft, NVIDIA, Niantic, NetEase Games and LG, among others, and has partnerships with key industry players such as Microsoft Xbox, Epic Games, and Unity.

Inworld was recognized by CB Insights as one of the 100 most promising AI companies in the world in 2024 and was also named among LinkedIn's Top Startups of 2024 in the USA.

We are seeking Staff and Principal Machine Learning Speech Engineers with extensive experience in R&D of text-to-speech (TTS) and speech-to-text (STT) technologies. In this role, you will be at the forefront of building generative AI stack to power next-generation AI characters.

Minimum Qualifications

Bachelor’s degree in Computer Science, Engineering, or a similar technical field.
6+ years of experience with software development in one or more programming languages, machine learning algorithms and tools (e.g., PyTorch), artificial intelligence, deep learning and/or natural language processing.
Excellent problem solving skills and the ability to work independently and as part of a team.

Preferred Qualifications

Master's degree or PhD in speech synthesis/recognition or adjacent fields.
5+ years of experience with design and architecture; and testing/launching software products.
1+ years of experience in working with sourcing and curating speech datasets.
1+ years of experience in a technical leadership role leading project teams and setting technical direction.
1+ years of experience in building end-to-end speech processing systems and real-time applications.

Responsibilities

Research and experiment with cutting edge ML techniques for TTS and STT applications.
Develop and test production-grade training and inference pipelines for TTS and STT applications.
Understand optimization problems in the area of speech, signals, and natural language processing.
Collaborate with cross-functional teams to integrate speech technologies into products.

In-office location: Mountain View, CA, United States.

Remote locations: United States and Canada.

The US base salary range for this full-time position is $240,000 - $385,000. In addition to base pay, total compensation includes equity and benefits. Within the range, individual pay is determined by work location, level, and additional factors, including competencies, experience, and business needs. The base pay range is subject to change and may be modified in the future.