Logo
Scuttlebutt Services, LLC

High Performance Computing HPC System Engineer

Scuttlebutt Services, LLC, Annapolis Junction, Maryland, United States, 20701


Annapolis Junction, MD - Salary Range 195k-225k (TS/SCI w/ Full Poly)Job BriefWe have multiple openings for Computer/Systems Engineers in Annapolis Junction, MD – we are looking for High Performance Computer (HPC) designers and developers to join a highly skilled, high performing agile team to support a nationally significant and fast-paced program. The focus is on developing a range of streamlined, collaborative applications for cybersecurity and analytics that shares data across agencies within the Intelligence Community (IC).ResponsibilitiesRequirements Gathering:Confer with other computer, systems, and software engineers to analyze complex requirements, use design software tools, provide support using formal specifications, data flow diagrams, and other accepted design techniques, and will use engineering principles to provide full systems lifecycle support for the growing HPC compute infrastructureSoftware Development:Shape the design, development, and/or modification of HPC software solutions by analyzing system performance standards, confer with users, computer/systems or software engineers; analyze systems flow, data usage and work processes; and investigate problem areasAlgorithms:Develop or implement algorithms to address HPC system performance and functional standardsDocumentation:Review HPC software and system documentation to further provide recommendations for improving existing documentation and software/system development process standardsQuality Control:Ensure quality control of all developed and modified HPC software and hardwareRequirementsActive TS/SCI clearance with full scope polygraphBachelors Degree in a STEM field or similar technical disciplineKnowledge and experience with HPC concepts to include cluster architecture, parallel file systems, and high-speed networkingDemonstrated ability to provision and configure HPC environments and componentsSolid understanding of accelerated computing scheduling and I/O stacksBroad and deep understanding of the issues that affect GPU performance, CPU performance, and scaling performanceProficiency with:Agile/Scrum software development methodologies and team collaborationLinux (Red Hat/CentOS) including OS, CLI (Command Line Interface), system administration, networking, storage, and securityWriting Linux based scripts to facilitate application integrationLightweight Directory Access Protocol (LDAP) experienceTCP/IP fundamentalsHPC workflows that use Message Passing Interface (MPI)Languages, libraries and tools used in HPC (C++, C, modern Fortran, HIP, CUDA, Python, MPI, OpenMP, etc.)Cluster configuration managements tools such as Ansible, Puppet, SaltUnix cluster and node monitoring tools, including Node Health Check (NHC), Nagios, Grafana and PrometheusNode.js and the NPM (Node Package Manager) ecosystemContinuous integration and software CM (Configuration Management) processes/toolsContainer technologies like Docker, Singularity, Shifter, CharliecloudSkilled generating and reviewing software/technical documentationUnderstanding of Test Driven Development (TDD) and automation toolsBonus SkillsA background in Signals Intelligence (SIGINT) is preferredExperience working with information security teams to ensure cybersecurity compliance of multi-user systemsKnowledge of algorithms, methods, software libraries, and other tools commonly used in scientific computationExperience with:Bright Computing platformVarious MPI implementations, IntelMPI, OpenMPI, MPICHFast, multivendor, distributed cluster storage systems like Lustre, GPFS (General Parallel File System), and XFS for HPC workloadsDeep learning frameworks like PyTorch and TensorFlowSoftware Defined NetworkingNvidia CUDA libraries and GPUsVirtualization techniques, cloud platform solutionsMLPerf benchmarkingAI/ML codingApache NiFiDevOpsAWS, Azure or GCP platform