Logo
Numerical Algorithms Group

HPC Platform Engineer

Numerical Algorithms Group, Houston, Texas, United States, 77246


Are you a High Performance Computing (HPC) Platform Engineer who wants to collaborate with a team of qualified, friendly, and supportive individuals on a wide range of exciting projects? Do you have the skills to build, run and tune HPC systems?If you are seeking employment in a long-standing successful company that values teamwork, offers family friendly working, flexible schedules and understands the need for work/life balance then NAG could be the perfect fit for you! NAG is a market leader in technical software and high-performance computing, and this is an exciting opportunity to contribute to our fast-growing HPC services team.The ideal candidate will be an innovative thinker who can go beyond traditional methods. Additionally, they should possess strong communication skills, including the ability to present complex technical topics to both technical and non-technical audiences. This role requires flexibility and versatility to work on several projects, while prioritizing business needs based on outcomes.Key Responsibilities

Designing, deploying and maintaining high-performance computing environments that support complex computational workloads.Configuring, optimizing, and managing HPC clusters, storage systems, and networking components to ensure peak performance and scalability.Troubleshoot hardware and software issues, implement security measures, and collaborate with data scientists, researchers, and developers to streamline workflows.Monitoring system performance, performing upgrades, and ensuring the platform meets the needs of users and organizational goals.About You / The Desired Candidate Will Have:

Qualifications and Experience:

A BSc (or equivalent) in an applied math/computer science/computational science or engineering subject.2+ years of experience deploying and administering HPC clusters.Great communication skills, ability to understand computational scientists and their domain-specific jargon.Solid understanding of HPC and accelerated computing with engineering or academic research communities.Some experience with network-distributed multi-node applications.Understanding of the basic building blocks of modern HPC clusters: types of network fabrics, types of parallel filesystems, memory hierarchies, accelerators.Deep understanding of operating systems, computer networks, and HPC applications.C/C++/Python/Bash programming and scripting experience.Knowledge of automation tools, including GitLab CI/CD pipelines.Experience with scheduling and resource management systems (Slurm, Grid Engine or similar).Experience with parallel filesystems (Lustre).Ability to multitask effectively in a dynamic environment.Strong understanding of Linux operating system environment, and tools.Experience with HPC workflows that use MPI.Experience with package installation and management tools such as Conda, Spack and RPM.What Will Make You Stand Out:

Strong understanding and previous involvement with Slurm configuration.Exposure to container technology for HPC applications.Experience with CUDA programming and GPU-oriented code optimization.Experience choosing appropriate on-prem vs. cloud utilization based on cost, data gravity, and resource availability, etc.Why Join Us?

We provide a comprehensive benefits package including a competitive salary (dependent on your experience), 401K plan with company match up to 5%, and health/dental/life/short-term and long-term disability insurance. Additionally, we offer 25 vacation days with 5 days mandatorily taken between Christmas and New Year’s holidays, as well as paid sick days, maternity and paternity leave.NAG is an equal opportunity employer, has a dedicated Women in Tech team, and is a founder Chapter of Women in HPC (WHPC). We strongly believe that a diverse workforce contributes to our ability to develop innovative products and services. To promote inclusivity and diversity, we employ a blind recruitment process, redacting all information that could introduce conscious or unconscious bias during the shortlisting process.About NAG

NAG provides industry-leading numerical software and technical services to banking and finance, energy, engineering, and market research, as well as academic and government institutions. World renowned for the NAG Library – the most rigorous and robust collection of numerical algorithms available – NAG also offers Automatic Differentiation, Machine Learning, and Mathematical Optimization products, as well as world-class technical consultancy across HPC and Cloud HPC, code porting and optimization, and other areas of numerical computing. Founded more than 50 years ago from a multi-university venture, NAG is headquartered in Oxford, UK with offices in the UK, US, EU and Asia.How to Apply

Apply for the position by submitting your CV with a cover letter. If you’re interested in the role but would like more details email us to schedule an informal chat. We are open to those seeking a part-time position.

#J-18808-Ljbffr