Logo
Icahn School of Medicine at Mount Sinai

Senior High Performance Computing System Administrator

Icahn School of Medicine at Mount Sinai, New York City, NY, United States


Strength Through Diversity

Ground breaking science. Advancing medicine. Healing made personal.

Roles & Responsibilities:

The Scientific Computing and Data group at the Icahn School of Medicine at Mount Sinai partners with scientists to accelerate scientific discovery. To achieve these aims, we support a cutting-edge high-performance computing and data ecosystem along with MD/PhD-level support for researchers. The group is composed of a high-performance computing team, the research clinical data warehouse team and a research data services team.

The Senior HPC Administrator, High Performance Computational and Data Ecosystem, is responsible for a computational and data science ecosystem for researchers at Mount Sinai. This ecosystem includes high-performance computing (HPC) systems, clinical research databases, and a software development infrastructure for local and national projects. To meet Sinai’s scientific and clinical goals, the Senior Administrator has a good technical understanding for computational, data and software development systems along with a strong focus on customer service for researchers. The HPC Senior Administrator is an expert troubleshooter and productive team member and leads projects to effective and efficient completion independently under little to no supervision. This position reports to the Director for Computational & Data Ecosystem in Scientific Computing. Specific responsibilities are listed below.

Responsibilities

  • Design, deploy and maintain Scientific Computing’s computational and data science ecosystem including ~30,000 cores with high bandwidth, low latency interconnects, GPUs, large shared memory nodes, databases, scientific workflows and 30+ petabytes of storage in production, clinical data warehouse and software development environment.
  • Lead the troubleshooting, isolation and resolution of all technical issues including application, system, hardware, software, and network). Actively monitors the systems.
  • Maintains, tunes and manages computational, data, cloud technologies and workflow systems for ISMMS researchers, scientists and their external collaborators. Defines and deploys a comprehensive computational and data vision. Identifies and communicates system advantages/disadvantages and tradeoffs.
  • Designs, develops, implements system administration tasks, including hardware and software configuration, configuration management, system monitoring (including the development and maintenance of regression tests), usage reporting, system performance (file systems, scheduler, interconnect, high availability, etc.), security, networking and metrics, etc.
  • Collaborates effectively with research and hospital system IT, compliance, HIPAA, security and other departments to ensure compliance with all regulations and Sinai policies.
  • Participates in the integration of HPC resources with laboratory equipment such as sequencers, clinical and research data resources and systems, etc. Incorporate and link data and compute resources.
  • Researches, deploys and optimizes resource management and scheduling software and policies and actively monitoring. Designs, tunes, manages and upgrades parallel file systems, storage and data-oriented resources.
  • Researches, deploys and manages security infrastructure, including development of policies and procedures.
  • Maintain all necessary aspects of HPC in accordance with best practices. Develops and implements backup policies.
  • Prepares and manages budgets for hardware, software and maintenance. Participates in chargeback/fee recovery analysis and provides suggestions to make operations sustainable.
  • Assists in developing and writing system design for research proposals. Creates and provides clear documentation.
  • Works effectively and productively with other team members within the group and across Mount Sinai.
  • Performs related duties as assigned or requested.
  • Provides after hours support for critical system and production issues.
  • Answers and resolves user tickets.

Qualifications:

  • Bachelor's degree in computer science, engineering or another scientific field. Master's or PhD preferred
  • 8+ years (higher preferred) of progressive HPC system administration and operations (preferably in a Redhat/CentOS Linux administration, Batch HPC cluster environment)
  • Must be an expert troubleshooter; Must be a team player and customer focused
  • Experience with job scheduler such as LSF or Slurm and parallel file systems and storage
  • Experience with networking and security
  • Experience with configuration management systems such as xCAT, Puppet and/or Ansible
  • Experience of databases and web services
  • Experience in Infiniband, Gigabit Ethernet
  • Experience in an academic or research community environment
  • Script and programming experience
  • Experience with Cloud Computing
  • Ability to multitask effectively in a dynamic environment
  • Excellent communication skills, analytical ability, strong judgment and management skills, and the ability to work effectively as a liaison between both research and technology teams.
  • Strong written, oral, and interpersonal communication skills

Preferred Experience

  • Advanced degree
  • Experience with GPFS, LSF, TSM, IB and ethernet networking
  • Experience with databases and web services is highly preferred

Strength Through Diversity

The Mount Sinai Health System believes that diversity, equity, and inclusion are key drivers for excellence. We share a common devotion to delivering exceptional patient care. When you join us, you become a part of Mount Sinai’s unrivaled record of achievement, education, and advancement as we revolutionize medicine together. We invite you to participate actively as a part of the Mount Sinai Health System team by:

  • Using a lens of equity in all aspects of patient care delivery, education, and research to promote policies and practices to allow opportunities for all to thrive and reach their potential.
  • Serving as a role model confronting racist, sexist, or other inappropriate actions by speaking up, challenging exclusionary organizational practices, and standing side-by-side in support of colleagues who experience discrimination.
  • Inspiring and fostering an environment of anti-racist behaviors among and between departments and co-workers.

At Mount Sinai, our leaders strive to learn, empower others, and embrace change to further advance equity and improve the well-being of staff, patients, and the organization. We expect our leaders to embrace anti-racism, create a collaborative and respectful environment, and constructively disrupt the status quo to improve the system and enhance care for our patients. We work hard to create an inclusive, welcoming and nurturing work environment where all feel they are valued, belong and are able to advance professionally.

Explore more about this opportunity and how you can help us write a new chapter in our history!

EOE Minorities/Women/Disabled/Veterans