Logo
Hofstra University

HPC Systems Administrator, School of Engineering

Hofstra University, Hempstead, New York, United States,


About Hofstra:Hofstra University is nationally ranked and recognized as Long Island's largest private university located in Hempstead, N.Y. When you work at Hofstra, you join a team of talented professionals committed to preparing students for the challenges of tomorrow, in an environment that cultivates learning through the free and open exchange of ideas for the betterment of humankind. The work we do at Hofstra supports the education and well-being of our students, and the workforce of the future. While working towards this mission, employees can take advantage of many enriching experiences on campus. Whether it's a lunchtime lecture, a Division I NCAA athletics game, a musical concert, a theatre performance, or a visit to one of our two accredited museums, there is always something exciting to do at Hofstra. Enjoy the ease of going to the fitness center, taking a swim, or grabbing a bite to eat without having to leave our beautiful campus! Hofstra University is dedicated to recruiting and retaining a highly qualified and diverse academic community of students, faculty, staff, and administrators respectful of the contributions and dignity of each of its members. We especially encourage women, people of color, members of the LGBTQ+ community, veterans, and people with disabilities to apply.

Position Title:

HPC Systems Administrator, School of EngineeringSchool/Division:

School of EngineeringFull Time/Part Time:

Full-TimeDescription:Reporting to the National Science Foundation Principal Investigator, the HPC Systems Administrator is responsible for the management of the new 'Star' high-performance computing ( HPC ) environment at Hofstra University. Responsibilities will include managing configuration, monitoring, optimizing performance, troubleshooting, and ensuring security and high availability of the cluster, as well as providing technical assistance and supporting the research community by collaborating with researchers to understand their needs and facilitating training to support effective use of the cluster. The HPC Systems Administrator will manage the Linux environment, hardware, network infrastructure, and software components of the HPC systems. This role requires expertise in systems administration and effective communication skills in engaging with researchers and providing technical guidance to students and faculty to help them explore, assess, and pinpoint technology solutions. Daily operations will include logging changes, managing job scheduler policies, monitoring job performance, advising and communicating with clients, installing software, applying patches, auditing, testing, troubleshooting, repair, maintenance, etc. The HRC Systems Administrator must be responsive to evolving research and class needs, and able to manage or provide services that have a broad business impact. This position is on-site during normal business hours but may require periodic remote work and occasional work during non-standard hours for system maintenance and urgent issues. This is a grant-funded position for a two-year period. This position is contingent upon grant funding.

Responsibilities include, but are not limited to:

Manages all operational aspects of the cluster, including installing, configuring, maintaining, and administering software and hardware components.Designs and plans for the implementation of future expansions and integrations.Administers the scheduler and its policies to ensure efficient job scheduling and resource allocation.Performs regular patching, updates, and maintenance of the cluster to ensure optimal performance and security.Performs Linux administration tasks, including software installation, managing system configuration, writing/maintaining shell scripts, analyzing logs, diagnosing, and resolving issues, performance tuning, and system maintenance.Deploys, administers, and monitors containerized applications.Conducts management and monitoring of cluster nodes.Serves as the primary contact for HPC inquiries, requests, and technical support.Supports clients and their applications through consultation and project planning.Communicates with researchers, faculty, and students to provide support and understand their needs.Advises clients on technical design and implementation of technology solutions for classes and research.Troubleshoots and resolves a variety of complex system and job execution issues.Conducts system monitoring to assess system health, proactively identify potential issues, optimizing performance, and maintaining overall system integrity, stability, and availability.Meets with stakeholders to review cluster usage, understand challenges, discuss policies, plan changes, and anticipate upcoming computing needs.Develops and manages processes to streamline operations.Implements and oversees security measures to protect data and uphold privacy standards.Coordinates and maintains periodic backups of systems.Maintains documentation of system configuration, changes, operating procedures, cluster components, troubleshooting instructions, and resolutions to promote knowledge transfer and teamwork.Writes guides to direct users through common tasks and procedures.Onboards new users, including account setup, access control, and customized user environment configurations.Collaborates with partners and vendors to operate and maintain the cluster and resolve issues.Provides user consultation and training to support effective use of the HPC resources and enhance research outcomes.Handles periodic on-call duty as well as out-of-band requests.Performs other related duties as assigned.Qualifications:Bachelor's degree in Computer Science or related field required.At least 3 years of relevant experience with Linux systems administration, including working with kernel modules.Fluency in multiple programming languages, including solid skills in shell scripting.Experience with Slurm administration and HPC clusters.Strong problem-solving and troubleshooting skills.Effective written and oral communication skills.Proficiency with data center GPU management and domain-specific tools.Proficiency with command line tools.Ability to work independently as well as collaboratively within a team.Preferred Qualifications:Familiarity with Message Passing Interface (MPI) and parallel computing environments preferred.Experience with containerization technologies such as Singularity/Apptainer.Knowledge of networking, storage, and parallel file systems in a clustered environment.EEO Statement:Hofstra University is an equal opportunity employer, committed to fostering diversity in its faculty, administrative staff and student body, and encourages applications from the entire spectrum of a diverse community.#J-18808-Ljbffr