Acceler8 Talent
Member of Technical Staff
Acceler8 Talent, Palo Alto, California, United States, 94306
Shape the Future of Conversational AIAbout UsWe are a public benefit corporation dedicated to harnessing advanced large language models to create an AI platform tailored for enterprise needs, with a particular focus on conversational AI. Our team is composed of friendly, innovative, and collaborative individuals committed to developing impactful AI solutions.About the Role: Research Engineer (Inference)As a key player in our commitment to deploying high-performance models for enterprise applications, you will be part of our inference team, which focuses on ensuring that these models operate efficiently and effectively in real-world scenarios. Research engineers will optimize model inference processes, minimize latency, and enhance throughput while maintaining model performance, all to ensure robust deployment in enterprise settings.Key ResponsibilitiesDeploy and optimize large language models (LLMs) for inference in both cloud and on-premises environments.Utilize model optimization and acceleration tools and frameworks, such as ONNX, TensorRT, or TVM.Tackle complex challenges related to model performance and scalability.Understand the trade-offs involved in model inference, including hardware limitations and real-time processing needs.Demonstrate proficiency in PyTorch and be familiar with infrastructure management tools like Docker and Kubernetes for deploying inference pipelines.What We Are Looking ForIf you have a strong background in deploying and optimizing LLMs, enjoy solving intricate problems, and have a deep understanding of model inference challenges, we would love to hear from you! Join us in building impactful enterprise AI solutions that will shape the future.