Pacific Northwest National Laboratory
Data Scientist 2 - Artificial Intelligence
Pacific Northwest National Laboratory, Providence, Rhode Island, us, 02912
Pacific Northwest National Laboratory Data Scientist 2 - Artificial Intelligence Providence, Rhode Island Apply Now
The Physical and Computational Sciences Directorate (PCSD) researchers lead major R&D efforts in experimental and theoretical interfacial chemistry, chemical analysis, high energy physics, interfacial catalysis, multifunctional materials, and integrated high-performance and data-intensive computing. PCSD is PNNL’s primary steward for research supported by the Department of Energy’s Offices of Basic Energy Sciences, Advanced Scientific Computing Research, and Nuclear Physics, all within the Department of Energy's Office of Science. Additionally, Directorate staff perform research and development for private industry and other government agencies, such as the Department of Defense and NASA. The Directorate's researchers are members of interdisciplinary teams tackling challenges of national importance that cut across all missions of the Department of Energy. Responsibilities The Data Science and Machine Intelligence Group is seeking a highly motivated Data Scientist with a strong background in Natural Language Processing (NLP) to join our dynamic and rapidly growing group. Focus for this position is on development (training and evaluation of LLMs) and application of NLP methods to solve challenging domain science problems. Preferred skills will include familiarity and prior experience in data science, high performance computing, high level languages such as Python, AI/ML libraries such as PyTorch and Tensorflow, as well as a record of high quality research publications in top-tier peer-reviewed venues. Designs, develops, and implements methods, processes, and systems to analyze diverse data. Applies knowledge of statistics, machine learning, advanced mathematics, simulation, software development, and data modeling to integrate and clean data, recognize patterns, address uncertainty, pose questions, and make discoveries from structured and/or unstructured data. Produces solutions driven by exploratory data analysis from complex and high-dimensional datasets. Designs, develops, and evaluates predictive models and advanced algorithms that lead to optimal value extraction from the data. Demonstrates ability to transfer skills across application domains. Training and evaluation of Large language models/Multimodal language models. Application of Large Language Models to solve different problems in science and beyond. Natural language processing and prompt engineering. General data science, data curation and data processing. Applied basic and advanced mathematical principles to identify trends in data sets. Qualifications Minimum Qualifications: BS/BA and 2 years of relevant experience -OR- MS/MA -OR- PhD Preferred Qualifications: Scientific publications in top tier venues (AAAI, NeurIPS, *ACL, EMNLP, ICLR, KDD, etc.) with significant contribution to the published work, preferably first-author publications. Experience working in Python including PyTorch, and LLM Training and Evaluation Frameworks (e.g., Torchtitan, Deepspeed, LoRA). Experience with LLM/Multimodal Language model (MLM) training and evaluation. Strong NLP knowledge to modify LLM/MLM architectures and training methodologies to improve downstream performance. Familiarity with cloud platforms and services (e.g., AWS, Google Cloud, Azure) for deploying large language models and large multimodal models. Experience with distributed training frameworks and tools such as DeepSpeed, HuggingFace Transformers and Accelerate, Megatron, etc. Interest in working in research projects focused towards building AI applications for science and energy domains. Strong foundation of data science and machine learning core skills, including data/information curation from large unstructured data in the form of PDFs (text, image). Experience working in a distributed team with diverse backgrounds is a plus. Hazardous Working Conditions/Environment Not applicable Additional Information Not applicable Testing Designated Position This is not a Testing Designated Position (TDP) About PNNL Pacific Northwest National Laboratory (PNNL) is a world-class research institution powered by a highly educated, diverse workforce committed to the values of Integrity, Creativity, Collaboration, Impact, and Courage. Every year, scores of dynamic, driven people come to PNNL to work with renowned researchers on meaningful science, innovations and outcomes for the U.S. Department of Energy and other sponsors; here is your chance to be one of them! At PNNL, you will find an exciting research environment and excellent benefits including health insurance, and flexible work schedules. PNNL is located in eastern Washington State—the dry side of Washington known for its stellar outdoor recreation and affordable cost of living. The Lab’s campus is only a 45-minute flight (or 3 hour drive) from Seattle or Portland, and is serviced by the convenient PSC airport, connected to 8 major hubs. Commitment to Excellence, Diversity, Equity, Inclusion, and Equal Employment Opportunity Our laboratory is committed to a diverse and inclusive work environment dedicated to solving critical challenges in fundamental sciences, national security, and energy resiliency. We are proud to be an Equal Employment Opportunity and Affirmative Action employer. In support of this commitment, we encourage people of all racial/ethnic identities, women, veterans, and individuals with disabilities to apply for employment. Drug Free Workplace PNNL is committed to a drug-free workplace supported by Workplace Substance Abuse Program (WSAP) and complies with federal laws prohibiting the possession and use of illegal drugs. If you are offered employment at PNNL, you must pass a drug test prior to commencing employment. PNNL complies with federal law regarding illegal drug use. Under federal law, marijuana remains an illegal drug. If you test positive for any illegal controlled substance, including marijuana, your offer of employment will be withdrawn. HSPD-12 PIV Credential Requirement In accordance with Homeland Security Presidential Directive 12 (HSPD-12) and Department of Energy (DOE) Order 473.1A, new employees are required to obtain and maintain a HSPD-12 Personal Identity Verification (PIV) Credential. To obtain this credential, new employees must successfully complete and pass a Federal Tier 1 background check investigation. Mandatory Requirements Please be aware that the Department of Energy (DOE) prohibits DOE employees and contractors from having any affiliation with the foreign government of a country DOE has identified as a “country of risk” without explicit approval by DOE and Battelle.
#J-18808-Ljbffr
The Physical and Computational Sciences Directorate (PCSD) researchers lead major R&D efforts in experimental and theoretical interfacial chemistry, chemical analysis, high energy physics, interfacial catalysis, multifunctional materials, and integrated high-performance and data-intensive computing. PCSD is PNNL’s primary steward for research supported by the Department of Energy’s Offices of Basic Energy Sciences, Advanced Scientific Computing Research, and Nuclear Physics, all within the Department of Energy's Office of Science. Additionally, Directorate staff perform research and development for private industry and other government agencies, such as the Department of Defense and NASA. The Directorate's researchers are members of interdisciplinary teams tackling challenges of national importance that cut across all missions of the Department of Energy. Responsibilities The Data Science and Machine Intelligence Group is seeking a highly motivated Data Scientist with a strong background in Natural Language Processing (NLP) to join our dynamic and rapidly growing group. Focus for this position is on development (training and evaluation of LLMs) and application of NLP methods to solve challenging domain science problems. Preferred skills will include familiarity and prior experience in data science, high performance computing, high level languages such as Python, AI/ML libraries such as PyTorch and Tensorflow, as well as a record of high quality research publications in top-tier peer-reviewed venues. Designs, develops, and implements methods, processes, and systems to analyze diverse data. Applies knowledge of statistics, machine learning, advanced mathematics, simulation, software development, and data modeling to integrate and clean data, recognize patterns, address uncertainty, pose questions, and make discoveries from structured and/or unstructured data. Produces solutions driven by exploratory data analysis from complex and high-dimensional datasets. Designs, develops, and evaluates predictive models and advanced algorithms that lead to optimal value extraction from the data. Demonstrates ability to transfer skills across application domains. Training and evaluation of Large language models/Multimodal language models. Application of Large Language Models to solve different problems in science and beyond. Natural language processing and prompt engineering. General data science, data curation and data processing. Applied basic and advanced mathematical principles to identify trends in data sets. Qualifications Minimum Qualifications: BS/BA and 2 years of relevant experience -OR- MS/MA -OR- PhD Preferred Qualifications: Scientific publications in top tier venues (AAAI, NeurIPS, *ACL, EMNLP, ICLR, KDD, etc.) with significant contribution to the published work, preferably first-author publications. Experience working in Python including PyTorch, and LLM Training and Evaluation Frameworks (e.g., Torchtitan, Deepspeed, LoRA). Experience with LLM/Multimodal Language model (MLM) training and evaluation. Strong NLP knowledge to modify LLM/MLM architectures and training methodologies to improve downstream performance. Familiarity with cloud platforms and services (e.g., AWS, Google Cloud, Azure) for deploying large language models and large multimodal models. Experience with distributed training frameworks and tools such as DeepSpeed, HuggingFace Transformers and Accelerate, Megatron, etc. Interest in working in research projects focused towards building AI applications for science and energy domains. Strong foundation of data science and machine learning core skills, including data/information curation from large unstructured data in the form of PDFs (text, image). Experience working in a distributed team with diverse backgrounds is a plus. Hazardous Working Conditions/Environment Not applicable Additional Information Not applicable Testing Designated Position This is not a Testing Designated Position (TDP) About PNNL Pacific Northwest National Laboratory (PNNL) is a world-class research institution powered by a highly educated, diverse workforce committed to the values of Integrity, Creativity, Collaboration, Impact, and Courage. Every year, scores of dynamic, driven people come to PNNL to work with renowned researchers on meaningful science, innovations and outcomes for the U.S. Department of Energy and other sponsors; here is your chance to be one of them! At PNNL, you will find an exciting research environment and excellent benefits including health insurance, and flexible work schedules. PNNL is located in eastern Washington State—the dry side of Washington known for its stellar outdoor recreation and affordable cost of living. The Lab’s campus is only a 45-minute flight (or 3 hour drive) from Seattle or Portland, and is serviced by the convenient PSC airport, connected to 8 major hubs. Commitment to Excellence, Diversity, Equity, Inclusion, and Equal Employment Opportunity Our laboratory is committed to a diverse and inclusive work environment dedicated to solving critical challenges in fundamental sciences, national security, and energy resiliency. We are proud to be an Equal Employment Opportunity and Affirmative Action employer. In support of this commitment, we encourage people of all racial/ethnic identities, women, veterans, and individuals with disabilities to apply for employment. Drug Free Workplace PNNL is committed to a drug-free workplace supported by Workplace Substance Abuse Program (WSAP) and complies with federal laws prohibiting the possession and use of illegal drugs. If you are offered employment at PNNL, you must pass a drug test prior to commencing employment. PNNL complies with federal law regarding illegal drug use. Under federal law, marijuana remains an illegal drug. If you test positive for any illegal controlled substance, including marijuana, your offer of employment will be withdrawn. HSPD-12 PIV Credential Requirement In accordance with Homeland Security Presidential Directive 12 (HSPD-12) and Department of Energy (DOE) Order 473.1A, new employees are required to obtain and maintain a HSPD-12 Personal Identity Verification (PIV) Credential. To obtain this credential, new employees must successfully complete and pass a Federal Tier 1 background check investigation. Mandatory Requirements Please be aware that the Department of Energy (DOE) prohibits DOE employees and contractors from having any affiliation with the foreign government of a country DOE has identified as a “country of risk” without explicit approval by DOE and Battelle.
#J-18808-Ljbffr