Idaho State Job Bank
Senior Principal Software Developer - Cluster Networks (JoinOCI-SDE)
Idaho State Job Bank, Boise, Idaho, United States, 83708
Senior Principal Software Developer - Cluster Networks (JoinOCI-SDE) at Oracle in Boise, Idaho, United States Job Description Job Description Oracle Cloud Infrastructure (OCI) Cluster Networking team is building an ultra-high performance network required to support AI/ML/HPC workloads. This is your opportunity to join the AI revolution and designing systems which allow customers to scale from tens to thousands of GPU without compromising on performance. This team will be responsible for designing, developing and performance tuning the software+hardware stack required to run distributed AI/ML/HPC workload across thousands of GPUs leveraging libraries like NCCL on high performance network. This is your opportunity to build innovative solutions for our customers from the ground up. These are exciting times and our team is still young and growing fast, working on ambitious new initiatives. We are looking for adaptable, self-motivated engineers with ability to learn quickly. You should be both a rock solid developer and a distributed systems generalist, able to dive deep into any part of the stack and low-level systems, as well as design broad distributed system interactions. You should value simplicity and scale, work comfortably in a collaborative, agile environment, and be excited to learn. Career Level - IC5 Responsibilities Basic Qualifications: + 10+ years of experience with software (systems/application) development + 2+ years of experience with collective communications libraries like NCCL, RCCL, MPI and GPU frameworks like CUDA and ROCm. + 2+ years of experience with ML training frameworks like PyTorch, TensorFlow + Proficient at programming in any two out of C/C++, Python, Java, Scala, GO + Proficient with data structures, algorithms, operating systems + Excellent organizational, verbal, and written communication skills + Bachelors in computer science and Engineering or related engineering fields Preferred Qualifications: + Masters / PhD degree in Computer Science or related engineering fields + Experience with RDMA programming, including but not limited to GPUDirect RDMA + Experience with distributed workload managers like Slurm or K8s + Experience with Linux Performance tools + Experience in SDN, NFV, Cloud Networking + Experience in Infrastructure-as-a-Service, viz. OpenStack, AWS, GCP, Azure Disclaimer: Certain US customer or client-facing roles may be required to comply with applicable requirements, such as immunization and occupational health mandates. Range and benefit information provided in this posting are specific to th To view full details and how to apply, please login or create a Job Seeker account