JPMorgan Chase
Software Engineer III - AI/ML Technology
JPMorgan Chase, Mc Lean, Virginia, us, 22107
Bring your engineering AI/ML skills to the next level where you will achieve state-of-the-art throughput for critical models using advanced techniques like model parallelism and distributed training. You will reduce inference time for new model architectures using optimizations like quantization and pruning. You will collaborate closely with Applied AI engineering to optimize the internal inference stack, leveraging technologies like TensorRT, ONNX, etc.
As a Software Engineer III - Machine Learning Engineer at JPMorgan Chase within the Corporate Sector, specifically the AIML Technology division, you will play a crucial role in an agile team. Your responsibilities will include enhancing, developing, and delivering a company-wide AI/ML/Data Platform in a secure, stable, and scalable manner. As a technical contributor, your duties will encompass the architecture, design, and construction of AI/ML related capabilities using cloud technology. You will have the opportunity to work with both traditional AI/ML and Generative AI.
Job Responsibilities
Designs and implements distributed ML infrastructure, including inference, training, scheduling, orchestration, and storage. Develops advanced monitoring and management tools for high reliability and scalability. Optimizes system performance by identifying and resolving inefficiencies and bottlenecks. Collaborates with product teams to deliver tailored, technology-driven solutions. Drives the adoption and execution of ML Platform tools across various teams. Integrates Generative AI within the ML Platform using state-of-the-art techniques. Required Qualifications, Capabilities, and Skills
Formal training or certification on software engineering concepts and 3+ years applied experience. Hands-on experience with ML frameworks (TensorFlow, PyTorch, JAX, scikit-learn). Experience with a Public Cloud provider (AWS, Azure, GCP) and addressing non-functional requirements such as scalability and cross-region resiliency. Strong coding skills and experience in developing large-scale ML systems and ensuring Software Best Practices. Experience with prompt engineering and interacting with various LLM vendors and models. Proven track record in contributing to and optimizing open-source ML frameworks. Proven ability to identify trade-offs, clarify project ambiguities, and drive decision-making. Preferred Qualifications, Capabilities, and Skills
Experience in Kubernetes ecosystem, including EKS, Helm, and custom operators. Background in High Performance Computing, ML Hardware Acceleration (e.g., GPU, TPU, RDMA), or ML for Systems.
#J-18808-Ljbffr
Designs and implements distributed ML infrastructure, including inference, training, scheduling, orchestration, and storage. Develops advanced monitoring and management tools for high reliability and scalability. Optimizes system performance by identifying and resolving inefficiencies and bottlenecks. Collaborates with product teams to deliver tailored, technology-driven solutions. Drives the adoption and execution of ML Platform tools across various teams. Integrates Generative AI within the ML Platform using state-of-the-art techniques. Required Qualifications, Capabilities, and Skills
Formal training or certification on software engineering concepts and 3+ years applied experience. Hands-on experience with ML frameworks (TensorFlow, PyTorch, JAX, scikit-learn). Experience with a Public Cloud provider (AWS, Azure, GCP) and addressing non-functional requirements such as scalability and cross-region resiliency. Strong coding skills and experience in developing large-scale ML systems and ensuring Software Best Practices. Experience with prompt engineering and interacting with various LLM vendors and models. Proven track record in contributing to and optimizing open-source ML frameworks. Proven ability to identify trade-offs, clarify project ambiguities, and drive decision-making. Preferred Qualifications, Capabilities, and Skills
Experience in Kubernetes ecosystem, including EKS, Helm, and custom operators. Background in High Performance Computing, ML Hardware Acceleration (e.g., GPU, TPU, RDMA), or ML for Systems.
#J-18808-Ljbffr