Northeastern University
HPC Performance Engineer
Northeastern University, South Boston, Massachusetts, United States,
About the Opportunity This job description is intended to describe the general nature and level of work being performed by people assigned to this classification. It is not intended to be construed as an exhaustive list of all responsibilities, duties and skills required of personnel so classified. Job Summary: The Research Computing (RC) team at Northeastern University seeks a motivated, self-starting individual to be a member of our dynamic team as a HPC Performance Engineer, to supports the incredible growth of RC's user base, computing infrastructure, and ever-increasing need to support Artificial Intelligence/Deep Learning (AI/DL) workloads. The successful candidate will be a key link between the RC team and the research community at NU, including faculty and students across a broad range of departments, as well as outside users and partners, to help them leverage RC resources for their research and teaching. The HPC Performance Engineer will help improve the overall reliability and efficiency of GPU/other AI-accelerator-based software applications and optimize the performance of the ML models including large language models (LLMs) and large vision models. The HPC Performance Engineer will provide key support in the full development cycle of ML models to ensure their optimal performance for faculty research groups in addition to helping increase both the adoption and efficient use of the university's HPC resources in executing AI/DL workloads. This position will work with faculty and researchers, advise members of the research community on best practices, and assist them in getting the most out of NU's cloud and HPC service offerings. As HPC Performance Engineer, you can also participate in grant funding opportunities, and author research papers and presentations with faculty members at Northeastern University. Additionally, you will have the opportunity to design and lead projects, working directly with RC Graduate Research Assistant (GRA) and Co-op student workers. Minimum Qualifications: Requirements Proficiency in the areas of GPUs and GPU-based software applications; machine learning, modeling, measurement techniques, testing, and statistical methods; and Python. Strong track record in optimizing application performance for AI accelerators and porting applications across architectures (e.g., NVIDIA GPUs, AMD GPUs, Cerebras, etc.) Ability to work with faculty to build technically-focused proposals around HPC solutions. Excellent time management skills. Ability to manage multiple projects simultaneously, plan and implement project specifications, report project status, and identify delays or resource shortages. Ability to communicate with team members effectively and work efficiently with team members to achieve daily, weekly, and monthly objectives. Excellent verbal and written communication skills with an ability to communicate solutions by providing both technical and non-technical interpretations of models and results. Knowledge and skills required for this role are typically acquired through a combination of formal education and experience: Bachelors's degree in a computational science or a related field is required, but Masters/PhD degree in HPC or a related field is preferred. Minimum of 2-3 years of experience in performance optimization. Minimum of 1-2 years of experience in high performance computing. Additionally, experience should include working with: linking the performance of hardware and software components of scientific applications, with emphasis on GPUs and GPU-based applications; optimization techniques including: SIMD (SSE, AVX), vectorization, loop dependencies, multithreading, multi-processor usage, and tensor cores; diverse communities regarding complex computing requirements and capabilities; batch management systems (e.g. Slurm, PBS, SGE, etc), including cluster configuration and management tools; and leveraging open source or commercial cloud technologies (e.g. Open Science Grid, AWS, Azure, GCP). Key Responsibilities: Partner with faculty and research staff to leverage HPC clusters at MGHPCC. Work with research groups to help strategize, streamline, and implement optimized ML workflows on the cluster. Participate in the research, deployment, and advertising of new ML technologies. Troubleshoot, isolate, and resolve application errors, and other technical issues. Benchmark application codes and deploy application performance tools. Effectively and efficiently resolve support/help tickets from the HPC users within the global Northeastern University community. Engage in research and authoring/co-authoring papers and research grants. Assist in developing and writing proposals to enhance the research enterprise at Northeastern. Ensure the maintenance and/or creation of documentation, training (internal and external), and communication in support of the high-performance computing infrastructure. Follow advancements on the cloud, research, and parallel computing fronts. Attend conferences and workshops relevant to HPC to advance skills. Promote diversity, equity, inclusion, and accessibility by fostering a collaborative workplace and group culture. Cover Letter The applicants are encouraged to include a cover letter highlighting the answers to two key questions: (1) One technical incident/experience related to HPC that you are most proud of (that is, an incident where you identified an issue /challenge and led the charge of devising and implementing a technical solution). (2) One non-technical incident/experience where you believe your strong work ethic, proactiveness, inclusive nature, and/or team-player spirit led to significant success for the whole team. Please limit the cover letter to a single page. Position Type Research Additional Information Northeastern University considers factors such as candidate work experience, education and skills when extending an offer. Northeastern has a comprehensive benefits package for benefit eligible employees. This includes medical, vision, dental, paid time off, tuition assistance, wellness & life, retirement- as well as commuting & transportation. Visit https://hr.northeastern.edu/benefits/ for more information. Northeastern University is an equal opportunity employer, seeking to recruit and support a broadly diverse community of faculty and staff. Northeastern values and celebrates diversity in all its forms and strives to foster an inclusive culture built on respect that affirms inter-group relations and builds cohesion. All qualified applicants are encouraged to apply and will receive consideration for employment without regard to race, religion, color, national origin, age, sex, sexual orientation, disability status, or any other characteristic protected by applicable law. To learn more about Northeastern University's commitment and support of diversity and inclusion, please see www.northeastern.edu/diversity.