University of Washington
HPC SYSTEMS ENGINEER
University of Washington, Seattle, Washington, us, 98127
If you are using a screen reader and experience any difficulty accessing our web pages, please call 206-543-2544 or email UWHires and we will be happy to assist you.Benefits:
As a UW employee, you will enjoy generous benefits and work/life programs. For a complete description of our benefits for this position, please visit our website.As a UW employee, you have a unique opportunity to change lives on our campuses, in our state, and around the world. UW employees offer their boundless energy, creative problem-solving skills, and dedication to build stronger minds and a healthier world. By being deeply invested in our work, showing compassion in our interactions, and embodying the spirit of a team player, each member contributes to a thriving community. UW is committed to attracting and retaining a diverse staff; your experiences, perspectives, and unique identities will be honored at the University of Washington.UW Information Technology (UW-IT) is the central information technology organization for the University of Washington, responsible for strategic planning, oversight, and direction of the UW’s IT infrastructure, resources, and services. UW-IT provides critical technology support to all three campuses, UW Medicine, and research operations around the world, partnering with the UW community to enable innovation, learning, discovery, and service.The Research Cyberinfrastructure group within the University’s central IT organization designs, implements, and operates Linux-based High-Performance Computing (HPC) clusters and storage systems to offer UW researchers cost-effective yet powerful high-performance computing options that unlock new possibilities for their research.This position focuses on supporting HPC efforts for the research computing group, while providing expertise and support to other endeavors when needed. This position requires a team-oriented professional, experienced in managing large IT systems for research using automated and software-defined approaches. This position regularly interfaces with other UW-IT teams as well as research customers across campus.Expertise in process automation, software development, Linux system administration, attention to detail, eagerness to evaluate and push new technologies, an optimistic “can do” disposition, collaborative growth mindset, and a customer-centric perspective are all key for success in this role.REQUIREMENTS:Bachelor’s Degree in computer science, information technology, scientific, engineering, or related field or experience.Minimum 4 years’ experience in Linux system administration experience or substantial experience working with Linux.Basic knowledge of networking hardware, software, protocols, and concepts.Demonstrated excellent written/oral communication skills, technical documentation skills, user liaison skills, and personal interaction abilities.Demonstrated ability to work with minimal supervision, both independently and as part of a team.DESIRED:Knowledge of containerization platforms such as Docker or Singularity.Progressively responsible experience as an engineer, architect, or role with comparable technical responsibilities in a large Linux HPC environment.Extensive experience with administration of Linux operating systems in a production environment, including experience with Red Hat Enterprise Linux or derivatives such as CentOS or Rocky Linux.Familiarity with SLURM or other HPC scheduler (PBS Pro, PBS/Torque, SGE/UGE, LSF, etc).Experience designing, configuring, and troubleshooting networks using both Ethernet and high-performance interconnects such as Infiniband.Proficiency in programming/scripting languages in the context of systems engineering or administration, preferably including Bash, Golang, or Python.Experience in the configuration and use of mass deployment tools such as MAAS, Foreman, xCAT, Cobbler, Warewulf, or similar.Employment of configuration management tools such as Ansible, Chef, SaltStack, or Puppet.Proficiency with the use of Git for source control in collaboration with a team with multiple contributors.Ability to administer and troubleshoot large high-performance parallel filesystems such as IBM Storage Scale (GPFS), Lustre, BeeGFS, Ceph.Experienced with the use of configuration management tools such as Ansible, SaltStack, or Puppet.Experience with use of containers, preferably including use of Apptainer in an HPC environment.Demonstrated excellent written/oral communication, technical documentation, and user liaison skills.Experience in a data center environment (e.g., racking equipment, running cables, labeling, asset tracking).Scientific background, research experience, and/or experience in a University setting.CONDITIONS OF EMPLOYMENT:Requires monitoring of e-mail and trouble ticket system for questions needing immediate response during business hours.On-call responsibilities for after-hours system outages.Server management will include both production (24x7x365) and development systems.Open office environment.Hybrid – expected to be in the office a minimum of two days per week.This is an essential position is required to work remotely when the University suspends operations.Application Process:
The application process may include completion of a variety of online assessments to obtain additional information that will be used in the evaluation process. These assessments may include Work Authorization, Cover Letter and/or others. Any assessments that you need to complete will appear on your screen as soon as you select “Apply to this position”. Once you begin an assessment, it must be completed at that time; if you do not complete the assessment, you will be prompted to do so the next time you access your “My Jobs” page. If you select to take it later, it will appear on your "My
#J-18808-Ljbffr
As a UW employee, you will enjoy generous benefits and work/life programs. For a complete description of our benefits for this position, please visit our website.As a UW employee, you have a unique opportunity to change lives on our campuses, in our state, and around the world. UW employees offer their boundless energy, creative problem-solving skills, and dedication to build stronger minds and a healthier world. By being deeply invested in our work, showing compassion in our interactions, and embodying the spirit of a team player, each member contributes to a thriving community. UW is committed to attracting and retaining a diverse staff; your experiences, perspectives, and unique identities will be honored at the University of Washington.UW Information Technology (UW-IT) is the central information technology organization for the University of Washington, responsible for strategic planning, oversight, and direction of the UW’s IT infrastructure, resources, and services. UW-IT provides critical technology support to all three campuses, UW Medicine, and research operations around the world, partnering with the UW community to enable innovation, learning, discovery, and service.The Research Cyberinfrastructure group within the University’s central IT organization designs, implements, and operates Linux-based High-Performance Computing (HPC) clusters and storage systems to offer UW researchers cost-effective yet powerful high-performance computing options that unlock new possibilities for their research.This position focuses on supporting HPC efforts for the research computing group, while providing expertise and support to other endeavors when needed. This position requires a team-oriented professional, experienced in managing large IT systems for research using automated and software-defined approaches. This position regularly interfaces with other UW-IT teams as well as research customers across campus.Expertise in process automation, software development, Linux system administration, attention to detail, eagerness to evaluate and push new technologies, an optimistic “can do” disposition, collaborative growth mindset, and a customer-centric perspective are all key for success in this role.REQUIREMENTS:Bachelor’s Degree in computer science, information technology, scientific, engineering, or related field or experience.Minimum 4 years’ experience in Linux system administration experience or substantial experience working with Linux.Basic knowledge of networking hardware, software, protocols, and concepts.Demonstrated excellent written/oral communication skills, technical documentation skills, user liaison skills, and personal interaction abilities.Demonstrated ability to work with minimal supervision, both independently and as part of a team.DESIRED:Knowledge of containerization platforms such as Docker or Singularity.Progressively responsible experience as an engineer, architect, or role with comparable technical responsibilities in a large Linux HPC environment.Extensive experience with administration of Linux operating systems in a production environment, including experience with Red Hat Enterprise Linux or derivatives such as CentOS or Rocky Linux.Familiarity with SLURM or other HPC scheduler (PBS Pro, PBS/Torque, SGE/UGE, LSF, etc).Experience designing, configuring, and troubleshooting networks using both Ethernet and high-performance interconnects such as Infiniband.Proficiency in programming/scripting languages in the context of systems engineering or administration, preferably including Bash, Golang, or Python.Experience in the configuration and use of mass deployment tools such as MAAS, Foreman, xCAT, Cobbler, Warewulf, or similar.Employment of configuration management tools such as Ansible, Chef, SaltStack, or Puppet.Proficiency with the use of Git for source control in collaboration with a team with multiple contributors.Ability to administer and troubleshoot large high-performance parallel filesystems such as IBM Storage Scale (GPFS), Lustre, BeeGFS, Ceph.Experienced with the use of configuration management tools such as Ansible, SaltStack, or Puppet.Experience with use of containers, preferably including use of Apptainer in an HPC environment.Demonstrated excellent written/oral communication, technical documentation, and user liaison skills.Experience in a data center environment (e.g., racking equipment, running cables, labeling, asset tracking).Scientific background, research experience, and/or experience in a University setting.CONDITIONS OF EMPLOYMENT:Requires monitoring of e-mail and trouble ticket system for questions needing immediate response during business hours.On-call responsibilities for after-hours system outages.Server management will include both production (24x7x365) and development systems.Open office environment.Hybrid – expected to be in the office a minimum of two days per week.This is an essential position is required to work remotely when the University suspends operations.Application Process:
The application process may include completion of a variety of online assessments to obtain additional information that will be used in the evaluation process. These assessments may include Work Authorization, Cover Letter and/or others. Any assessments that you need to complete will appear on your screen as soon as you select “Apply to this position”. Once you begin an assessment, it must be completed at that time; if you do not complete the assessment, you will be prompted to do so the next time you access your “My Jobs” page. If you select to take it later, it will appear on your "My
#J-18808-Ljbffr