Logo
Abridge AI Inc.

ML Infrastructure Engineer (Staff/Senior)

Abridge AI Inc., San Francisco, California, United States, 94199


Abridge was founded in 2018 with the mission of powering deeper understanding in healthcare. Our AI-powered platform was purpose-built for medical conversations, improving clinical documentation efficiencies while enabling clinicians to focus on what matters most—their patients.Our enterprise-grade technology transforms patient-clinician conversations into structured clinical notes in real-time, with deep EMR integrations. Powered by Linked Evidence and our purpose-built, auditable AI, we are the only company that maps AI-generated summaries to ground truth, helping providers quickly trust and verify the output. As pioneers in generative AI for healthcare, we are setting the industry standards for the responsible deployment of AI across health systems.We are a growing team of practicing MDs, AI scientists, PhDs, creatives, technologists, and engineers working together to empower people and make care make more sense.The Role

As an ML Infrastructure Engineer at Abridge, you will be responsible for scaling and deploying machine learning models to handle increasing traffic demands and integrating them with various platforms. You'll play a pivotal role in building a scalable infrastructure that not only supports current deployments but also lays the foundation for long-term growth. Your role will be critical in ensuring our AI-driven healthcare platform is powered by robust, scalable, and efficiently deployed models.What You'll Do

Architect, design, and implement ML software systems for deploying and managing models at scale.Stand up ML models for inference, starting with critical models like the 'linkages' model, and ensure they are capable of handling traffic increases.Develop and maintain infrastructure that supports efficient ML operations, including model evaluations, deployments, and training at scale.Collaborate closely with ML researchers, engineers, and cross-functional teams to ensure seamless integration of models with services like Zoom and Athena.Work with stakeholders across machine learning and operations teams to iterate on systems design and implementation.Optimize and maintain the performance of ML systems to ensure high availability, fault tolerance, and smooth scalability.Troubleshoot production issues and continuously improve systems to enhance performance and efficiency.What You'll Bring

5+ years of experience in ML model deployment and scaling, with a focus on production-quality software.Strong proficiency in Python and Kubernetes, with experience building scalable ML infrastructure.Expertise in designing fault-tolerant, highly available systems.Experience working with cloud environments, Infrastructure as Code (IaC), and managing deployments using Kubernetes.Proficiency in optimizing system performance, debugging production issues, and designing systems for scalability and security.Experience in software design and architecture for highly available machine learning systems for use cases like inference, evaluation, and experimentation.Excellent understanding of low-level operating systems concepts, including multi-threading, memory management, networking and storage, performance, and scale.Bachelor's/Master’s Degree or greater in Computer Science/Engineering, Statistics, Mathematics, or equivalent.Excellent interpersonal and written communication skills.Ideally, You Have

Experience with large-scale ML platforms like Ray, Databricks, or AnyScale.Expertise with ML toolchains such as PyTorch or TensorFlow.Proven experience working with distributed systems and handling inference at scale.Background in working with teams and leaders to deliver impactful ML-powered solutions in fast-paced environments.Demonstrated experience incubating and productionizing new technology, working closely with research scientists and technical teams from idea generation through implementation.Why Work at Abridge?

Be a part of a trailblazing, mission-driven organization that is powering deeper understanding in healthcare through AI!Opportunity to work and grow with talented individuals and have ownership and impact at a high-growth startup.Flexible/Unlimited PTO

— Salaried team members can take off as much approved time off as they need, plus 13 paid holidays.Equity

— For all salaried team members.Medical insurance

— We pay 100% of the premium for you + 75% for dependents. 3 Aetna plans to choose from.Dental & Vision insurance

— We pay 100% of the premium for you + 75% for dependents. 2 Aetna plans to choose from.Flexible Spending (FSA) & Health Savings (HSA) Accounts .Learning and Development budget

— $3,000 per year for coaching, courses, workshops, conferences, etc.401k Plan

— Contribute pre-tax dollars toward retirement savings.Paid Parental Leave

— 16 weeks paid parental leave for all full-time employees.Flexible working hours

— We care more about what you accomplish than what specific hours you’re working.Home Office Budget

— We provide up to $1,600 in a one-time reimbursement to set up your home office.Sabbatical Leave

— 30 days of paid Sabbatical Leave after 5 years of employment....Plus much more!Diversity & Inclusion

Abridge is an equal opportunity employer. Diversity and inclusion is at the core of what we do. We actively welcome applicants from all backgrounds (including but not limited to race, gender, educational background, and sexual orientation).

#J-18808-Ljbffr