LLM Algorithmic Optimization Engineer - 6 months INTERN (start immediately NO s
NIO, San Jose, CA, United States
NIO Inc. is a pioneer and a leading company in the premium smart electric vehicle market. Founded in November 2014, NIO’s mission is to shape a joyful lifestyle. NIO aims to build a community starting with smart electric vehicles to share joy and grow together with users.
NIO designs, develops, jointly manufactures and sells premium smart electric vehicles, driving innovations in next-generation technologies in autonomous driving, digital technologies, electric powertrains and batteries. NIO differentiates itself through its continuous technological breakthroughs and innovations, such as its industry-leading battery swapping technologies, Battery as a Service, or BaaS, as well as its proprietary autonomous driving technologies and Autonomous Driving as a Service, or ADaaS. NIO’s product portfolio consists of the ES8, a six-seater smart electric flagship SUV, the ES7 (or the EL7), a mid-large five-seater smart electric SUV, the ES6 (or the EL6), a five-seater all-round smart electric SUV, the EC7, a five-seater smart electric flagship coupe SUV, the EC6, a five-seater smart electric coupe SUV, the ET9, a smart electric executive flagship, the ET7, a smart electric flagship sedan, the ET5, a mid-size smart electric sedan, and the ET5T, a smart electric tourer.
Job Description:
- Conduct research and apply cutting-edge technologies to optimize Large Language Models (LLMs) and multimodal models, exploration and implementation of the core algorithmic optimization on heterogeneous architectures, for highly efficient LLM inference as well as deployment across distributed and heterogeneous hardware environments.
- Focus on model optimization from a systems perspective, ensuring efficient deployment in the vehicle’s digital cockpit and advanced driving (AD) domain.
- Collaborate with cross-functional teams to ensure the integration of optimized models into real-world automotive applications.
- Contribute to the entire pipeline from research, development, and testing, through to deployment on hardware, including GPUs and other distributed systems.
Qualifications:
- Currently pursuing or completed a PhD or Master’s degree in Computer Science, Computer Engineering, Applied Mathematics, Communications, Electronics, or a related field with relevant research projects and publications.
- Strong understanding of GPU/NPU architecture and optimization techniques to identify and address bottlenecks.
- Proficient in LLM and VLM architectures and algorithms, familiar with transformer based NLP / Audio / CV algorithms and technologies.
- Proficiency in Python and experience with AI-related training and inference tools such as PyTorch.
- Proficiency in C/C++ programming is a MUST.
- Hands-on experience with model-serving frameworks such as Open Neural Network Exchange (ONNX).
- Familiarity with debugging code in distributed computing environments.
- Experience with LLM inference backend (e.g., llama.cpp, or other backends) is a big plus
Preferred Qualifications:
- Ph.D. in computer science, artificial intelligence, or related fields; or Masters degree + 3 years of relevant industry experience
- Experience in inference optimization techniques of deep learning models or libraries on hardware architectures;
- Familiar with microkernel architecture, Linux kernel, hypervisor, middleware, and application framework
- Those who have good publication records and have published high impact, innovative papers are preferred