Pacific Northwest National Laboratory
Data Scientist 2 - Artificial Intelligence
Pacific Northwest National Laboratory, Juneau, Alaska, us, 99812
Pacific Northwest National Laboratory Data Scientist 2 - Artificial Intelligence Juneau, Alaska Apply Now
The Physical and Computational Sciences Directorate (PCSD) researchers lead major R&D efforts in experimental and theoretical interfacial chemistry, chemical analysis, high energy physics, interfacial catalysis, multifunctional materials, and integrated high-performance and data-intensive computing. PCSD is PNNL’s primary steward for research supported by the Department of Energy’s Offices of Basic Energy Sciences, Advanced Scientific Computing Research, and Nuclear Physics, all within the Department of Energy's Office of Science. Additionally, Directorate staff perform research and development for private industry and other government agencies, such as the Department of Defense and NASA. The Directorate's researchers are members of interdisciplinary teams tackling challenges of national importance that cut across all missions of the Department of Energy. Responsibilities The Data Science and Machine Intelligence Group is seeking a highly motivated Data Scientist with a strong background in Natural Language Processing (NLP) to join our dynamic and rapidly growing group. In particular, the focus for this position is on development (training and evaluation of LLMs) and application of NLP methods to solve challenging domain science problems. Designs, develops, and implements methods, processes, and systems to analyze diverse data. Applies knowledge of statistics, machine learning, advanced mathematics, simulation, software development, and data modeling to integrate and clean data, recognize patterns, address uncertainty, pose questions, and make discoveries from structured and/or unstructured data. Produces solutions driven by exploratory data analysis from complex and high-dimensional datasets. Designs, develops, and evaluates predictive models and advanced algorithms that lead to optimal value extraction from the data. Demonstrates ability to transfer skills across application domains. Training and evaluation of Large Language Models/Multimodal Language Models. Application of Large Language Models to solve different problems in science and beyond. Natural language processing and prompt engineering. General data science, data curation, and data processing. Applied basic and advanced mathematical principles to identify trends in data sets. Qualifications Minimum Qualifications: BS/BA and 2 years of relevant experience -OR- MS/MA -OR- PhD Preferred Qualifications: Scientific publications in top tier venues (AAAI, NeurIPS, *ACL, EMNLP, ICLR, KDD, etc.) with significant contribution to the published work, preferably first-author publications. Experience working in Python including PyTorch, and LLM Training and Evaluation Frameworks (e.g., Torchtitan, Deepspeed, LoRA). Experience with LLM/Multimodal Language model (MLM) training and evaluation. Strong NLP knowledge to modify LLM/MLM architectures and training methodologies to improve downstream performance. Familiarity with cloud platforms and services (e.g., AWS, Google Cloud, Azure) for deploying large language models and large multimodal models. Experience with distributed training frameworks and tools such as DeepSpeed, HuggingFace Transformers and Accelerate, Megatron, etc. Interest in working in research projects focused towards building AI applications for science and energy domains. Strong foundation of data science and machine learning core skills, including data/information curation from large unstructured data in the form of PDFs (text, image). Experience working in a distributed team with diverse backgrounds is a plus. About PNNL Pacific Northwest National Laboratory (PNNL) is a world-class research institution powered by a highly educated, diverse workforce committed to the values of Integrity, Creativity, Collaboration, Impact, and Courage. At PNNL, you will find an exciting research environment and excellent benefits including health insurance, and flexible work schedules. PNNL is located in eastern Washington State—the dry side of Washington known for its stellar outdoor recreation and affordable cost of living. Commitment to Excellence, Diversity, Equity, Inclusion, and Equal Employment Opportunity Our laboratory is committed to a diverse and inclusive work environment dedicated to solving critical challenges in fundamental sciences, national security, and energy resiliency.
#J-18808-Ljbffr
The Physical and Computational Sciences Directorate (PCSD) researchers lead major R&D efforts in experimental and theoretical interfacial chemistry, chemical analysis, high energy physics, interfacial catalysis, multifunctional materials, and integrated high-performance and data-intensive computing. PCSD is PNNL’s primary steward for research supported by the Department of Energy’s Offices of Basic Energy Sciences, Advanced Scientific Computing Research, and Nuclear Physics, all within the Department of Energy's Office of Science. Additionally, Directorate staff perform research and development for private industry and other government agencies, such as the Department of Defense and NASA. The Directorate's researchers are members of interdisciplinary teams tackling challenges of national importance that cut across all missions of the Department of Energy. Responsibilities The Data Science and Machine Intelligence Group is seeking a highly motivated Data Scientist with a strong background in Natural Language Processing (NLP) to join our dynamic and rapidly growing group. In particular, the focus for this position is on development (training and evaluation of LLMs) and application of NLP methods to solve challenging domain science problems. Designs, develops, and implements methods, processes, and systems to analyze diverse data. Applies knowledge of statistics, machine learning, advanced mathematics, simulation, software development, and data modeling to integrate and clean data, recognize patterns, address uncertainty, pose questions, and make discoveries from structured and/or unstructured data. Produces solutions driven by exploratory data analysis from complex and high-dimensional datasets. Designs, develops, and evaluates predictive models and advanced algorithms that lead to optimal value extraction from the data. Demonstrates ability to transfer skills across application domains. Training and evaluation of Large Language Models/Multimodal Language Models. Application of Large Language Models to solve different problems in science and beyond. Natural language processing and prompt engineering. General data science, data curation, and data processing. Applied basic and advanced mathematical principles to identify trends in data sets. Qualifications Minimum Qualifications: BS/BA and 2 years of relevant experience -OR- MS/MA -OR- PhD Preferred Qualifications: Scientific publications in top tier venues (AAAI, NeurIPS, *ACL, EMNLP, ICLR, KDD, etc.) with significant contribution to the published work, preferably first-author publications. Experience working in Python including PyTorch, and LLM Training and Evaluation Frameworks (e.g., Torchtitan, Deepspeed, LoRA). Experience with LLM/Multimodal Language model (MLM) training and evaluation. Strong NLP knowledge to modify LLM/MLM architectures and training methodologies to improve downstream performance. Familiarity with cloud platforms and services (e.g., AWS, Google Cloud, Azure) for deploying large language models and large multimodal models. Experience with distributed training frameworks and tools such as DeepSpeed, HuggingFace Transformers and Accelerate, Megatron, etc. Interest in working in research projects focused towards building AI applications for science and energy domains. Strong foundation of data science and machine learning core skills, including data/information curation from large unstructured data in the form of PDFs (text, image). Experience working in a distributed team with diverse backgrounds is a plus. About PNNL Pacific Northwest National Laboratory (PNNL) is a world-class research institution powered by a highly educated, diverse workforce committed to the values of Integrity, Creativity, Collaboration, Impact, and Courage. At PNNL, you will find an exciting research environment and excellent benefits including health insurance, and flexible work schedules. PNNL is located in eastern Washington State—the dry side of Washington known for its stellar outdoor recreation and affordable cost of living. Commitment to Excellence, Diversity, Equity, Inclusion, and Equal Employment Opportunity Our laboratory is committed to a diverse and inclusive work environment dedicated to solving critical challenges in fundamental sciences, national security, and energy resiliency.
#J-18808-Ljbffr