Logo
FlexAI

AI Cluster Network Architect R&D HW · San Francisco (Bay Area) ·

FlexAI, San Francisco, California, United States, 94199


Join FlexAI:

FlexAI is at the forefront of revolutionizing AI computing by reengineering infrastructure at the system level. Our groundbreaking architecture, combined with sophisticated software intelligence, abstraction, and an orchestration layer, allows developers to leverage a diverse array of compute, resulting in efficient, more reliable computing at a fraction of the cost.The rapid evolution of machine intelligence has created a need for a new system architecture capable of handling high memory capacity and bandwidth. These are critical bottlenecks in pushing machine intelligence to the next level, where compute demand is expected to increase up to 1000 times current levels.FlexAI has pioneered a groundbreaking solution to tackle these memory challenges. Our innovative compute architecture ensures a well-balanced distribution of memory bandwidth, capacity, and compute density, ensuring maximum utilization of system resources. This architecture is the cornerstone of our datacenter-in-a-box concept, which is enabled by our universal AI compute cloud service. Our hardware solutions are built for seamless deployment with our own AI cloud offerings and other cloud service providers worldwide, setting new standards in performance and efficiency.We are seeking a skilled and experienced

AI Cluster Network Architect

to design, optimize, and scale our AI infrastructure, driving the development and efficiency of our groundbreaking compute architecture.Position Overview:

As the

AI Cluster Network Architect , you will be responsible for designing and implementing high-performance network architectures that support AI clusters, ensuring efficient communication between nodes, optimized data flow, and scalability. This role requires a deep understanding of AI workloads, networking protocols, and distributed computing. You will work closely with AI researchers, data scientists, and infrastructure teams to deliver networking solutions that meet the demands of advanced AI systems.Success at FlexAI requires an entrepreneurial spirit and startup mindset: the ability to rapidly iterate and make meaningful progress while staying focused on our mission to deliver more compute with less complexity. Your proven expertise in cultivating influence, aligning diverse stakeholders, and driving efficient operations while fostering a supportive environment through mentorship and thoughtful leadership of a growing team will be critical to your success.What you’ll do:Design and implement high-performance network architectures for AI clusters, ensuring low-latency, high-throughput data communication between nodes.Collaborate with AI engineers, data scientists, and IT teams to understand AI workloads and optimize network infrastructure accordingly.Architect scalable and fault-tolerant networking solutions that can handle massive datasets and distributed AI computations.Evaluate and integrate advanced networking technologies, including RDMA (Remote Direct Memory Access), InfiniBand, and high-speed ethernet.Ensure that network architectures are optimized for AI-specific workloads, such as deep learning, machine learning, and data-intensive computing.Monitor and troubleshoot network performance issues, implementing solutions to maintain optimal performance across AI clusters.Work closely with hardware engineers and datacenter teams to design and implement networking infrastructure that supports AI training and inference.Drive continuous improvement in network design, focusing on scalability, security, and performance.Stay updated on the latest trends and innovations in AI networking, contributing to the development of best practices and new technologies.Model inclusive behaviors and contribute to a culture that values and respects different backgrounds and perspectives.What you’ll need to be successful:Bachelor’s or Master’s degree in Computer Science, Network Engineering, or a related field. Advanced degrees are a plus.8+ years of network architecture experience, focusing on high-performance computing (HPC) or AI infrastructure.Proven experience designing and implementing large-scale network architectures for AI clusters or distributed computing environments.Deep knowledge of networking protocols, including TCP/IP, RDMA, InfiniBand, and high-speed Ethernet.Experience with AI workloads, including deep learning and machine learning, and their impact on network design.Strong understanding of network security best practices and strategies for protecting AI infrastructure.Hands-on experience with network monitoring tools and techniques for performance optimization and troubleshooting.Ability to collaborate effectively with cross-functional teams, including AI engineers, data scientists, and infrastructure teams.Strong problem-solving skills and a data-driven approach to decision-making.Preferred Skills:Experience with containerized environments (e.g., Kubernetes) and their networking challenges in AI clusters.Knowledge of cloud-based AI infrastructure and hybrid cloud networking solutions.Familiarity with network simulation and modeling tools for AI workloads.Experience with emerging networking technologies, such as Software-Defined Networking (SDN) and Network Function Virtualization (NFV).What we offer:A competitive salary and benefits package, tailored to recognize your dedication and contributions.The opportunity to collaborate with leading experts in AI and cloud computing, learning from the best and the brightest, fostering continuous growth.An environment that values innovation, collaboration, and mutual respect.Support for personal and professional development, empowering you with the tools and resources to elevate your skills and leave a lasting impact.A pivotal role in the AI revolution, shaping the technologies that power the innovations of tomorrow.About FlexAI:

Founded by

Brijesh Tripathi

and

Dali Kilani , who bring experience from Nvidia, Apple, Tesla, Intel, Lifen, and Zoox, FlexAI is not just building a product – we’re shaping the future of AI.Offices :

Our teams are strategically distributed across three continents—Europe, North America, and Asia—united by a shared mission: to deliver more compute with less complexity.Paris - HQSan Francisco (Bay Area) - US officeBangalore - India officeApply NOW!You’ve seen what this role entails. Now we want to hear from you! Does this opportunity align with your aspirations? If you’re even slightly curious, we encourage you to apply – it could be the start of something extraordinary!At FlexAI, we believe diverse teams are the most innovative teams. We’re committed to creating an inclusive environment where everyone feels valued, and we proudly offer equal opportunities regardless of gender, sexual orientation, origin, disabilities, veteran status, or any other facets of your identity that make you uniquely you.

#J-18808-Ljbffr