Logo
WEX

Staff AI Infrastructure Engineer

WEX, San Francisco, California, United States, 94199


(*) The candidate must reside within 30 miles of one of the following locations: San Francisco Bay Area, CA; Portland, ME; Chicago, Il.

About the Team/RoleWEX is an innovative global commerce platform and payments technology company looking to forge the way in a rapidly changing environment, to simplify the business of doing business for customers, freeing them to spend more time, with less worry, on the things they love and care about. We are journeying to build a consistent world-class user experience across our products and services and leverage customer-focused innovations across all our strategic initiatives, including big data, AI, and Risk. Our AI Infrastructure team is pivotal in enabling these advancements.We are looking for a highly motivated and highly potential Staff Engineer to join our AI Infrastructure team to make significant contributions to our cloud-based AI solutions and grow your career.This is a really exciting time to be in the AI Infrastructure team at WEX as a technical leader. Our team is responsible for building and maintaining the robust, scalable, and secure cloud infrastructure that powers our AI and machine learning initiatives. We work with cutting-edge technologies like AWS, Azure, Docker, and Kubernetes to create a dynamic environment that supports the development and deployment of AI models at scale.We have challenging problems with huge business impact potential for you to lead, work on and grow. We also have a strong team with highly talented and skillful engineers and leaders to support, guide, and coach you.If you dream to be a strong engineer who can solve tough problems, lead, generate big impacts, and grow fast, this is a great opportunity for you!How you'll make an impactCollaborate with data scientists, ML engineers, and stakeholders to understand the requirements and challenges of AI/ML workloads.

Design, implement, and maintain highly scalable and secure cloud infrastructure on AWS and Azure to support AI/ML workloads using IaC technologies like Terraform.

Architect and manage containerization (Docker) and orchestration (Kubernetes) for efficient deployment and scaling of AI/ML applications.

Develop and optimize CI/CD pipelines for automating the build, test, and deployment of AI/ML models and infrastructure.

Implement and manage robust monitoring and alerting systems to ensure the health, performance, and reliability of production AI infrastructure.

Proactively analyze system performance data to identify bottlenecks, optimize resource utilization, and improve overall efficiency.

Stay current with emerging cloud technologies, tools, and best practices in the AI/ML infrastructure space.

Mentor and guide junior team members, fostering a culture of continuous learning and knowledge sharing.

Contribute to the team's technical roadmap and strategic initiatives.

Troubleshoot complex technical issues and provide timely solutions.

Participate in on-call rotation to ensure 24/7 availability and support of critical AI infrastructure.

Drive the adoption of best practices and standards for AI infrastructure development and operations.

Lead technical design and architecture discussions, ensuring alignment with business goals and long-term scalability.

Proactively identify and address potential risks and issues related to AI infrastructure.

Collaborate with cross-functional teams to drive the successful implementation of AI/ML projects.

Contribute to the development of the team's technical capabilities and expertise.

Represent the team in technical discussions and presentations with internal and external stakeholders.

Experience you'll bringBachelor's degree in Computer Science, Software Engineering, or a related field. OR demonstrable equivalent deep understanding, experience, and capability.

A Master's or PhD degree in Computer Science (or related field) is a plus.

7+ years of experience in software engineering or cloud infrastructure, with a strong focus on AI/ML infrastructure.

Demonstrable advanced programming skills in a 3GL strongly-typed language like Java, Python, C/C++ or Golang.

Strong understanding of cloud platforms (AWS and Azure), including services relevant to AI/ML (e.g., EC2, S3, EKS, Azure ML, AKS).

Hands-on experience with containerization (Docker) and container orchestration (Kubernetes) in production environments at scale.

Extensive experience in building and managing CI/CD pipelines for infrastructure and ML model deployment (using tools like Jenkins, GitLab CI/CD, etc.).

Strong understanding of networking concepts (VPC, subnets, routing, firewalls) and experience configuring network infrastructure in the cloud.

Experience with infrastructure monitoring and alerting tools (e.g., Prometheus, Grafana, CloudWatch, Azure Monitor).

Strong scripting skills (Python, Bash) for automation and configuration management.

Strong communication and collaboration skills, with the ability to work effectively in a team environment.

The base pay range represents the anticipated low and high end of the pay range for this position. Actual pay rates will vary and will be based on various factors, such as your qualifications, skills, competencies, and proficiency for the role. Base pay is one component of WEX's total compensation package. Most sales positions are eligible for commission under the terms of an applicable plan. Non-sales roles are typically eligible for a quarterly or annual bonus based on their role and applicable plan. WEX's comprehensive and market competitive benefits are designed to support your personal and professional well-being. Benefits include health, dental and vision insurances, retirement savings plan, paid time off, health savings account, flexible spending accounts, life insurance, disability insurance, tuition reimbursement, and more. For more information, check out the "About Us" section. Pay Range: $147,000.00 - $195,000.00