Southern Methodist University
Senior HPC Systems Administrator (HR Title: Systems Administrator III)
Southern Methodist University, Dallas, Texas, United States, 75215
Job Description - Senior HPC Systems Administrator (HR Title: Systems Administrator III) (INF00000175)Salary Range:
Salary commensurate with experience and qualificationsAbout SMU:SMU’s more than 12,000 diverse, high-achieving students come from all 50 states and over 80 countries to take advantage of the University’s small classes, meaningful research opportunities, leadership development, community service, international study and innovative programs.SMU is data-driven, and its powerful supercomputing ecosystem – paired with entrepreneurial drive – creates an unrivaled environment for the University to deliver research excellence.Now in its second century of achievement, SMU is recognized for the ways it supports students, faculty, and alumni as they become ethical, enterprising leaders in their professions and communities.About the Department:SMU supports some of the state’s leading high-performance computing (HPC) clusters. The M3 cluster boasts 1,077 TFLOPS, 181 nodes, 22,892 CPU cores, 122,880 accelerator cores, and 200Gb/s bandwidth. Meanwhile, the NVIDIA DGX SuperPOD offers 1,644 TFLOPS, 20 nodes, 2,560 CPU cores, 1,392,640 accelerator cores, and 200Gb/s bandwidth.About the Position:This role is an on-campus, in-person position.Dedicated to supporting SMU's research community, the Senior System Administrator for High Performance Computing (HPC) works exclusively to design, build, maintain, operate, and manage HPC systems at SMU. This position shares responsibility for university HPC technical support as a member of a two-person HPC systems infrastructure team. This position also assists with Enterprise Linux support.This position provides hardware, software, and end-user support for SMU's growing number of research faculty and center compute resources dedicated to the advancement of SMU research activities.Essential Functions:Design, plan, deploy, administer services & troubleshoot issues related to HPC services for research at SMU.Install and maintain cluster environments and provision systems using automated installation methods.Manage/maintain Lustre parallel file system and NFS storage.Manage/maintain InfiniBand high performance interconnect fabric.Configure, manage, monitor SLURM scheduling & queuing system.Develop/maintain programs/scripts that aid in operation and automation of administrative tasks using various shell and scripting languages (bash, Perl, Python) required by systems dedicated to research.Compile, install, and port software in support needed by SMU researchers.Build and deploy open source and vendor/commercial software required by researchers.Document all configurations, procedures, and changes.Diagnose and resolve system and operational problems with research systems.Coordinate with vendors to resolve hardware and software problems.Keep current with research computing, HPC technology trends and best practices.Qualifications:Education and Experience:Bachelor’s degree is required. A minimum of six years of full-time Linux system administration experience in a large computing environment is required.Knowledge, Skills and Abilities:Candidate must demonstrate clear, professional communication to work with team members and customers of diverse technical abilities. Experience with NVidia DGX, Containers, and Kubernetes is desired. Direct experience working with InfiniBand and knowledge of configuration and management of SLURM or other scheduling and queuing systems is required.This position participates in a 24-hour, 7-day on-call support rotation and off-hours maintenance windows.Preferred Skills:Familiarity with DDN hardware and the Lustre file systemProficiency in supporting Nvidia/Mellanox InfiniBand networksCompetence with Bright Cluster ManagerKnowledge of Nvidia DGX systemsExperience with KubernetesPhysical and Environmental Demands:Sit for long periods of timeDeadline to Apply:
Priority consideration might be given to submissions received by December 4, 2024.EEO Statement:
SMU will not discriminate in any program or activity on the basis of race, color, religion, national origin, sex, age, disability, genetic information, veteran status, sexual orientation, or gender identity and expression.
#J-18808-Ljbffr
Salary commensurate with experience and qualificationsAbout SMU:SMU’s more than 12,000 diverse, high-achieving students come from all 50 states and over 80 countries to take advantage of the University’s small classes, meaningful research opportunities, leadership development, community service, international study and innovative programs.SMU is data-driven, and its powerful supercomputing ecosystem – paired with entrepreneurial drive – creates an unrivaled environment for the University to deliver research excellence.Now in its second century of achievement, SMU is recognized for the ways it supports students, faculty, and alumni as they become ethical, enterprising leaders in their professions and communities.About the Department:SMU supports some of the state’s leading high-performance computing (HPC) clusters. The M3 cluster boasts 1,077 TFLOPS, 181 nodes, 22,892 CPU cores, 122,880 accelerator cores, and 200Gb/s bandwidth. Meanwhile, the NVIDIA DGX SuperPOD offers 1,644 TFLOPS, 20 nodes, 2,560 CPU cores, 1,392,640 accelerator cores, and 200Gb/s bandwidth.About the Position:This role is an on-campus, in-person position.Dedicated to supporting SMU's research community, the Senior System Administrator for High Performance Computing (HPC) works exclusively to design, build, maintain, operate, and manage HPC systems at SMU. This position shares responsibility for university HPC technical support as a member of a two-person HPC systems infrastructure team. This position also assists with Enterprise Linux support.This position provides hardware, software, and end-user support for SMU's growing number of research faculty and center compute resources dedicated to the advancement of SMU research activities.Essential Functions:Design, plan, deploy, administer services & troubleshoot issues related to HPC services for research at SMU.Install and maintain cluster environments and provision systems using automated installation methods.Manage/maintain Lustre parallel file system and NFS storage.Manage/maintain InfiniBand high performance interconnect fabric.Configure, manage, monitor SLURM scheduling & queuing system.Develop/maintain programs/scripts that aid in operation and automation of administrative tasks using various shell and scripting languages (bash, Perl, Python) required by systems dedicated to research.Compile, install, and port software in support needed by SMU researchers.Build and deploy open source and vendor/commercial software required by researchers.Document all configurations, procedures, and changes.Diagnose and resolve system and operational problems with research systems.Coordinate with vendors to resolve hardware and software problems.Keep current with research computing, HPC technology trends and best practices.Qualifications:Education and Experience:Bachelor’s degree is required. A minimum of six years of full-time Linux system administration experience in a large computing environment is required.Knowledge, Skills and Abilities:Candidate must demonstrate clear, professional communication to work with team members and customers of diverse technical abilities. Experience with NVidia DGX, Containers, and Kubernetes is desired. Direct experience working with InfiniBand and knowledge of configuration and management of SLURM or other scheduling and queuing systems is required.This position participates in a 24-hour, 7-day on-call support rotation and off-hours maintenance windows.Preferred Skills:Familiarity with DDN hardware and the Lustre file systemProficiency in supporting Nvidia/Mellanox InfiniBand networksCompetence with Bright Cluster ManagerKnowledge of Nvidia DGX systemsExperience with KubernetesPhysical and Environmental Demands:Sit for long periods of timeDeadline to Apply:
Priority consideration might be given to submissions received by December 4, 2024.EEO Statement:
SMU will not discriminate in any program or activity on the basis of race, color, religion, national origin, sex, age, disability, genetic information, veteran status, sexual orientation, or gender identity and expression.
#J-18808-Ljbffr