Bloomberg L.P.
Senior Software Engineer- Data Science Platform- Generative AI Inference
Bloomberg L.P., New York, New York, us, 10261
Senior Software Engineer - Data Science Platform - Generative AI Inference in New York, New York
Bloomberg runs on data. It's our business and our product. From the biggest banks to elite hedge funds, financial institutions need timely, accurate data to capture opportunities and evaluate risk in fast-moving markets. With petabytes of data available, a solution to transform and analyze the data is critical to our success. Bloomberg’s Data Science Platform was established to support development efforts around data-driven science, machine learning, and business analytics. The solution aims to provide scalable compute, specialized hardware, and first-class support for a variety of workloads such as ML training jobs and inference services, Spark, and Jupyter. The solution was developed to provide a standard set of tooling for addressing the Model Development Life Cycle from experimentation and training to inference. The solution is built using containerization, container orchestration, and cloud architecture and is based on 100% open source foundations. Production Inference is a critical step on the MDLC to realize the business value for Bloomberg AI applications, and the advent of large language models (LLMs) presents new opportunities for expanding NLP capabilities in our products. The inference solution is powered by the open-source project KServe, which is a production-ready inference solution for both generative and predictive AI applications. We are poised for enormous user growth this year and have an ambitious roadmap in terms of new features as well as improved user experience. That’s where you come in. As a member of the inference team, you’ll have the opportunity to design and implement scalable, low latency, high throughput model inference solutions in a hybrid cloud environment. We are founding members of the KServe project to standardize ML Inference within the Kubernetes ecosystem. As part of that, we regularly upstream features we develop, present at conferences, and collaborate with our peers in the industry. Open source is at the heart of our team. It's not just something we do in our free time; it is how we work. We’ll trust you to:
Interact with data scientists to understand their production use cases and requirements to advise the next set of GenAI features for the inference platform. Design solutions for problems such as scalable model deployment, low latency/high throughput inference, GPU resource optimizations, and autoscaling. Automate operation and improve telemetry of the inference platform in our infrastructure stack. Design solutions for multi-cloud strategy. You’ll need to be able to:
Innovate and design solutions that keep in mind strict production SLA: low latency/high throughput, multi-tenancy, high availability, reliability across clusters/data centers, etc. Fix and optimize generative inference application performance. Provide developer and operational documentation. Provide performance analysis and capacity planning for clusters. Strong communication and collaboration skills, with the ability to work effectively with multi-functional teams. Have a passion for providing reliable and scalable infrastructure. You’ll need to have:
4+ years programming experience in two or more languages (e.g., Python, Go, C++). A Degree in Computer Science, Engineering, or similar field of study or equivalent work experience. Experience designing and implementing low-latency, high-scalability inference platform. Design, develop, test, and deploy inference solutions for LLMs. Explore emerging inference optimization techniques. Experience with debugging performance issues with distributed tracing. Experience working with a distributed multi-tenancy and multi-cluster system. Experience with distributed systems e.g., Kubernetes, Kafka, RabbitMQ, Zookeeper/Etcd. Strong knowledge of data structures and algorithms. Linux systems experience (Network, OS, Filesystems). We’d love to see:
Experience with Large Language Model Inference, especially vLLM, TensorRT-LLM runtimes. Experience with Kubeflow/KServe, MLFlow, Sagemaker. Experience working with GPU compute software and hardware. Ability to identify and perform OS and hardware-level optimizations. Open source involvement such as a well-curated blog, accepted contribution, or community presence. Experience with cloud LLM providers such as AWS Redrock, Gemini, or Azure OpenAI. Experience with configuration management systems (Terraform, Ansible). Experience with continuous integration tools and technologies (Jenkins, Git, Chat-ops). Learn more about our work using the links below:
Keynote: Platform Building Blocks: How to build ML infrastructure with CNCF projects -
https://www.youtube.com/watch?v=ncED2EMcxZ8 The State and Future of Cloud Native Model Inference -
https://www.youtube.com/watch?v=786VaGAfm6I The Hitchhiker's Guide to Kubernetes Platforms: Don’t Panic, Just Launch! -
https://www.youtube.com/watch?v=a84mwXicpdc Bloomberg is an equal opportunity employer and we value diversity at our company. We do not discriminate on the basis of age, ancestry, color, gender identity or expression, genetic predisposition or carrier status, marital status, national or ethnic origin, race, religion or belief, sex, sexual orientation, sexual and other reproductive health decisions, parental or caring status, physical or mental disability, pregnancy or parental leave, protected veteran status, status as a victim of domestic violence, or any other classification protected by applicable law. Bloomberg is a disability inclusive employer. Please let us know if you require any reasonable adjustments to be made for the recruitment process. If you would prefer to discuss this confidentially, please email
amer_recruit@bloomberg.net .
#J-18808-Ljbffr
Bloomberg runs on data. It's our business and our product. From the biggest banks to elite hedge funds, financial institutions need timely, accurate data to capture opportunities and evaluate risk in fast-moving markets. With petabytes of data available, a solution to transform and analyze the data is critical to our success. Bloomberg’s Data Science Platform was established to support development efforts around data-driven science, machine learning, and business analytics. The solution aims to provide scalable compute, specialized hardware, and first-class support for a variety of workloads such as ML training jobs and inference services, Spark, and Jupyter. The solution was developed to provide a standard set of tooling for addressing the Model Development Life Cycle from experimentation and training to inference. The solution is built using containerization, container orchestration, and cloud architecture and is based on 100% open source foundations. Production Inference is a critical step on the MDLC to realize the business value for Bloomberg AI applications, and the advent of large language models (LLMs) presents new opportunities for expanding NLP capabilities in our products. The inference solution is powered by the open-source project KServe, which is a production-ready inference solution for both generative and predictive AI applications. We are poised for enormous user growth this year and have an ambitious roadmap in terms of new features as well as improved user experience. That’s where you come in. As a member of the inference team, you’ll have the opportunity to design and implement scalable, low latency, high throughput model inference solutions in a hybrid cloud environment. We are founding members of the KServe project to standardize ML Inference within the Kubernetes ecosystem. As part of that, we regularly upstream features we develop, present at conferences, and collaborate with our peers in the industry. Open source is at the heart of our team. It's not just something we do in our free time; it is how we work. We’ll trust you to:
Interact with data scientists to understand their production use cases and requirements to advise the next set of GenAI features for the inference platform. Design solutions for problems such as scalable model deployment, low latency/high throughput inference, GPU resource optimizations, and autoscaling. Automate operation and improve telemetry of the inference platform in our infrastructure stack. Design solutions for multi-cloud strategy. You’ll need to be able to:
Innovate and design solutions that keep in mind strict production SLA: low latency/high throughput, multi-tenancy, high availability, reliability across clusters/data centers, etc. Fix and optimize generative inference application performance. Provide developer and operational documentation. Provide performance analysis and capacity planning for clusters. Strong communication and collaboration skills, with the ability to work effectively with multi-functional teams. Have a passion for providing reliable and scalable infrastructure. You’ll need to have:
4+ years programming experience in two or more languages (e.g., Python, Go, C++). A Degree in Computer Science, Engineering, or similar field of study or equivalent work experience. Experience designing and implementing low-latency, high-scalability inference platform. Design, develop, test, and deploy inference solutions for LLMs. Explore emerging inference optimization techniques. Experience with debugging performance issues with distributed tracing. Experience working with a distributed multi-tenancy and multi-cluster system. Experience with distributed systems e.g., Kubernetes, Kafka, RabbitMQ, Zookeeper/Etcd. Strong knowledge of data structures and algorithms. Linux systems experience (Network, OS, Filesystems). We’d love to see:
Experience with Large Language Model Inference, especially vLLM, TensorRT-LLM runtimes. Experience with Kubeflow/KServe, MLFlow, Sagemaker. Experience working with GPU compute software and hardware. Ability to identify and perform OS and hardware-level optimizations. Open source involvement such as a well-curated blog, accepted contribution, or community presence. Experience with cloud LLM providers such as AWS Redrock, Gemini, or Azure OpenAI. Experience with configuration management systems (Terraform, Ansible). Experience with continuous integration tools and technologies (Jenkins, Git, Chat-ops). Learn more about our work using the links below:
Keynote: Platform Building Blocks: How to build ML infrastructure with CNCF projects -
https://www.youtube.com/watch?v=ncED2EMcxZ8 The State and Future of Cloud Native Model Inference -
https://www.youtube.com/watch?v=786VaGAfm6I The Hitchhiker's Guide to Kubernetes Platforms: Don’t Panic, Just Launch! -
https://www.youtube.com/watch?v=a84mwXicpdc Bloomberg is an equal opportunity employer and we value diversity at our company. We do not discriminate on the basis of age, ancestry, color, gender identity or expression, genetic predisposition or carrier status, marital status, national or ethnic origin, race, religion or belief, sex, sexual orientation, sexual and other reproductive health decisions, parental or caring status, physical or mental disability, pregnancy or parental leave, protected veteran status, status as a victim of domestic violence, or any other classification protected by applicable law. Bloomberg is a disability inclusive employer. Please let us know if you require any reasonable adjustments to be made for the recruitment process. If you would prefer to discuss this confidentially, please email
amer_recruit@bloomberg.net .
#J-18808-Ljbffr