Logo
Oak Ridge National Laboratory

HPC Software Engineer (Hybrid Eligible)

Oak Ridge National Laboratory, Oak Ridge, Tennessee, United States, 37830


We're hiring an HPC Software Engineer to support the integration of computing hardware and software tools for accomplishing research tasks across a variety of scientific research areas! This position is in the Emerging Technologies & Computing (ETAC) Group in the Research Computing Support Division (RCSD) of the Information Technology Services Directorate (ITSD) at Oak Ridge National Laboratory (ORNL).Our HPC engineering team facilitates the mission of ORNL through HPC systems engineering, integration, and support for the research community at ORNL. By providing design, deployment, optimization, monitoring, and tooling support across multiple clustered infrastructures, we facilitate Lab-wide R&D projects. Our HPC clusters range in scope from just a handful of nodes to greater than fifty thousand cores.What will you be doing?ETAC focuses on supporting ORNL researcher’s HPC computing, Data Engineering and Management, Infrastructure as a Service, and new technology needs. You'll be working directly with researchers, supporting their science and understanding scientific problems and the application of advanced research computing tools to help achieve research outcomes. The HPC Scientific software engineers play a crucial role in optimizing computational methods and facilitating groundbreaking research across multiple scientific areas. As an HPC team member, you will recommend computational and/or visualization tools, techniques, and methodologies for the scientific computing aspect of research investigations.Major Duties/Responsibilities:Scientific Software and Application Management:

Understand scientific software users’ requirements: work closely with researchers to understand their computational needs and translate them into efficient HPC applications. Analyze application performance to identify bottlenecks and develop strategies to improve scalability and efficiency on HPC systems. This may involve profiling code, analyzing communication patterns, and tuning system parameters.Install and manage scientific software: deploy and maintain a wide range of scientific applications, libraries, and development tools on HPC systems to support research activities.Develop custom tools and scripts: develop tools to automate common tasks, improve systems management, and facilitate sophisticated computational workflows. Develop, maintain, and install software for HPC and data intensive architectures, including Graphic Processing Units (GPUs), parallel systems, and other computing environments.

User support and collaboration:

Provide software technical support: collaborate with HPC support and scientists on technical issues related to scientific software problems. Following industry standards, implement HPC software with novel programming and optimization techniques. Provide solutions and technical recommendations for code optimization, resource utilization, and system tuning.Collaborate on research projects: work closely with researchers to understand their computational requirements and assist in developing efficient computational strategies, code optimization, and parallelization. This includes working with a highly diverse and multidisciplinary team (such as mathematicians, physicists, computer scientists, and engineers) in the research, development, integration, testing, and deployment of research software, data platforms, and machine learning systems for large-scale data analysis.Research information dissemination: support research staff in disseminating results in peer-reviewed journals, technical reports, relevant conferences, and open-source software project repos.

Research and development:

Stay informed about latest research in HPC and AI.Develop and recommend ideas for new programs, products, and features by staying abreast of new technology developments and trends.

Partnerships and collaboration:

As applicable/possible- establish and maintain partnerships and collaborations with industry, other groups at ORNL, and HPC networks to share knowledge and best practices.

Deliver ORNL’s mission by aligning behaviors, priorities, and interactions with our core values of Impact, Integrity, Teamwork, Safety, and Service. Promote diversity, equity, inclusion, and accessibility by encouraging a respectful workplace – in how we treat one another, work together, and measure success.Basic Qualifications:A BS in computer science, computer engineering, information systems, or a related field of study and five (5) to seven (7) years of proven and aligned experience is required. An overall combination of equivalent experience may be considered.Three (3) or more years of demonstrated abilities in the following areas:

High Performance Computing (HPC) environments and HPC scheduling software.Software development including version control using Git with open-source tools and software.Python and data analysis modules such as Pandas, NumPy, and Dask.Developing software in C/C++, Fortran or other programming languages.

Preferred Qualifications:In-depth understanding of HPC architectures and their optimization techniques.Experience in the following areas:

Optimizing and parallelizing software products for HPC using MPI or other open-source tools.HPC debugging tools such as DDT, GDB or Valgrind.AI toolkits such as PyTorch, RAPIDSAI, TensorFlow, or Keras.Statistical analysis software such as Python or R.Building and running containerized applications in an HPC environment.Cluster deployment tools such as Warewulf, PXEboot, and/or Bright.Managing systems.Working in a government, scientific, or other highly technical environment.

Knowledge of multiple operating systems including Linux.Exposure to microservices concepts and understanding of container environments including Podman, Docker, and Kubernetes.Proven ability to balance sophisticated research and security requirements.Special Requirements:Visa sponsorship is not available for this position.This position requires the ability to acquire and maintain a clearance from the Department of Energy. As such, this position is a Workplace Substance Abuse (WSAP) testing designated position. WSAP positions require passing a pre-placement drug test and participation in an ongoing random drug testing program.Benefits at ORNL:ORNL offers competitive pay and benefits programs to attract and retain dedicated people. The laboratory offers many employee benefits, including medical and retirement plans and flexible work hours, to help you and your family live happy and healthy. Employee amenities such as on-site fitness, banking, and cafeteria facilities are also provided for convenience.Other benefits include the following: Prescription Drug Plan, Dental Plan, Vision Plan, 401(k) Retirement Plan, Contributory Pension Plan, Life Insurance, Disability Benefits, Generous Vacation and Holidays, Parental Leave, Legal Insurance with Identity Theft Protection, Employee Assistance Plan, Flexible Spending Accounts, Health Savings Accounts, Wellness Programs, Educational Assistance, Relocation Assistance, and Employee Discounts.In addition, we offer a flexible work environment that supports both the organization and the employee. A hybrid/onsite working arrangement may be available with this position.Having difficulty using the online application system or need an accommodation to apply due to a disability? Please email: ORNLRecruiting@ornl.gov or call 1.866.963.9545.This position will remain open for a minimum of 5 days after which it will close when a qualified candidate is identified and/or hired.ORNL is an equal opportunity employer. All qualified applicants, including individuals with disabilities and protected veterans, are encouraged to apply. UT-Battelle is an E-Verify employer.

#J-18808-Ljbffr