Logo
Apex Systems

HPC Software Support Engineer

Apex Systems, Rockville, Maryland, us, 20849


Apex is searching for an HPC Software Engineer with

5+ years of experience.

Candidate must be able to obtain a

Public Trust

and commute a few times per year to

Rockville, MD.If interested, email me at

jobarker@apexsystems.comThanks :)HPC Software Support Engineer Job DescriptionResponsibilities:Work with a 4000+ core HPC cluster that is GPU-focused and a 1,500+ HPC cluster supporting the hardware and operating system environmentsSupporting bioinformatics applications for a large and diverse research community with needs in genomics, cryo-electron microscopy, and AI/MLMonitor the portfolio of software applications and be proactive in planning upgrades and license renewalsMonitor and report on cluster performance and generate data to show usage and trendsTriage support requests from the research community and work with others in the Scientific Infrastructure team to resolve issues and complete service requestsCollaborate with researchers to guide them in effective use of the HPC resources, such as job scheduler submission, data formats, and building data workflowsEngage with researchers to understand their HPC needs to include data life cycle management, integration of scientific instruments to HPC, and storage capacity and compute requirementsProvide input to the Scientific Infrastructure team leader for setting priorities for cluster operations, scheduling policies, resources needed, etc.Attend and actively participate in daily standup meetings to provide updates on progress, discuss obstacles, and co-ordinate tasks with other team membersWork collaboratively in a team environment to achieve project goalsEngage in open communication, share knowledge, and support fellow teammatesProvide feedback and contribute to the continuous improvement of team processesWhat You’ll Need to Succeed:Education:BS/BA (or equivalent)Required Experience:Five years of related experienceRequired Technical Skills:Minimum of five years of experience with servers, datacenters, networking, and related technologiesMinimum of five years of experience managing Linux systemsExperience with Spack package manager, including making packages from PyPi, R, GithubExperience installing and packaging GPU applications and optimizing job submission scripts that are used for ML model training, data mining operations, or high-res graphics renderingExperience with Python scriptingExperience using Git distributed workflowsExperience with Ansible manage system configurationExperience with Terraform for provisioning systems

#J-18808-Ljbffr