Marathon TS

Generative AI Data Scientist

Marathon TS, Riverdale, Maryland, us, 20738

Subject: Request for Proposal

Reference #: ODID 75641_SSA_Generative AI Data Scientist

☒ Temp To Perm

Must be local to Baltimore

Role Title Generative AI Data Scientist Start Date for assignment 12/02/2024 End Date for assignment 09/28/2025 # of Resources Needed 1 Hours per Week 40 Job Description

Specialization : NA

Technical Skills : Skill Years/Level of Experience Generative AI P5 - Master Python (Programming Language) P5 - Master P1 - Beginner (0-2 yrs experience) P2 - Intermediate (3-5 yrs experience) P3 - Advanced (7-10 yrs experience P4 - Expert (10+ yrs experience)

Role Description : Formulating, design and deliver AI/Client-based decision-making frameworks and models for business outcomes. Measure and justify AI/Client based solution values.

Data Scientist GenAI Engineer: We are seeking a Python GenAI Engineer with expertise in developing, fine-tuning, and integrating AI models, particularly in natural language processing (NLP). This role focuses on building generative AI solutions for summarization tasks and other NLP applications, while incorporating prompt engineering and human-in-the-loop feedback to optimize AI outputs. The ideal candidate will possess demonstrated prior experience analyzing unstructured medical records, developing AI models for extracting insights, and incorporating human-in-the-loop feedback to improve model performance. You will collaborate closely with data scientists, software engineers, and other stakeholders to integrate AI models into production environments within cloud infrastructure. Required Qualifications & Experience: • 5+ years of experience in AI/Client development with a strong focus on NLP and generative models, using frameworks such as TensorFlow, PyTorch, and Hugging Face • Expertise in Python, with experience in libraries like Transformers, NLTK, SpaCy, Gensim, and data manipulation tools such as Pandas and NumPy • Implement dynamic prompt engineering strategies to optimize model outputs (1-2 years preferred) • Expertise in frameworks such as TensorFlow, PyTorch, and Hugging Face • Strong proficiency in Python, with experience in libraries like Transformers and NLTK • Familiarity with generative AI models such as OpenAI's GPT, Llama, and supporting libraries like VLLM • Strong analytical skills and experience with statistical modeling and data analysis • Ability to effectively articulate technical challenges and solutions • Strong communicator with excellent written and verbal communication skills • Identify and analyze user requirements to generate stories and tasks for team backlog • Prioritize and execute tasks throughout the software development life cycle • Create custom NLP algorithms and annotators to evaluate medical record data • Create custom tools to enable analysts to perform data research • Solid understanding of statistical modeling, data analysis, and performance evaluation metrics. • Demonstrated experience analyzing and processing unstructured clinical data (e.g., electronic health records, physician notes, imaging reports), using techniques such as tokenization, lemmatization, and word embeddings (e.g., TF-IDF, BERT) • Familiarity with healthcare data formats and standards such as HL7, FHIR, ICD codes, and SNOMED • Experience with cloud platforms (AWS, Azure), containerization (Docker), and using CI/CD pipelines for machine learning model deployment • Knowledge of SQL (PostgreSQL, MySQL) and NoSQL (MongoDB, Elasticsearch) databases, and how to structure data pipelines for efficient data processing. Experience optimizing databases (SQL: PostgreSQL, MySQL; NoSQL: MongoDB, Elasticsearch) to support efficient data storage and retrieval for AI models • Develop and fine-tune AI models for natural language processing (NLP) tasks, including Named Entity Recognition (NER), text classification, summarization, and sentiment analysis, particularly with unstructured clinical records • Conduct experiments to evaluate model performance, utilizing metrics such as precision, recall, and F1-score to iteratively improve models through hyperparameter tuning and training optimizations • Experience implementing prompt engineering strategies and apply transformer models to enhance generative AI outputs using frameworks like Hugging Face and PyTorch • Experience integrating AI models into production environments, collaborating with software engineers and using cloud platforms like AWS to ensure scalability and performance • Analyze and preprocess large datasets, particularly unstructured medical records (e.g., physician notes, discharge summaries), using tools like Pandas, NLTK, and SpaCy • Stay updated with the latest research and advancements in AI and NLP, applying state-of-the-art techniques such as transfer learning, attention mechanisms, and fine-tuning pre-trained models to healthcare-specific challenges • Master's degree (Data Science, AI, Computer Science, or a related field) + 10 years experience; or PhD + 4 years Preferred Qualifications: • Experience in healthcare, particularly working with unstructured medical records in clinical settings, leveraging NLP models for insight extraction. • Experience working with human-in-the-loop systems, incorporating clinician/end-user feedback and leveraging tools like SciPy and NumPy to improve AI model accuracy • Educational background or practical training in a clinical setting, with exposure to clinical workflows and medical terminologies • Familiarity with deep learning techniques, attention mechanisms, and transformers applied to healthcare dat Education Level Master's Degree + 10 years experience Work Location On-site (Government / AFS Site):

Baltimore, MD On-site %:

0% Off-site (Contractor Site):

Remote

Role is remote, candidate must be local to DC, MD, VA (DMV) area for badging purposes and potential onsite work in the future. Special Requirements Work Authorization : ☒ US Citizens ☐ Dual Citizens ☐ US Persons (Green Card) ☐ Foreign Nationals (H1B etc.)

Clearance Required : ☒ Public Trust - Full Clearance ☐ Secret ☐ Top Secret ☐ TS/SCI ☐ None (Background check validation per MSA) ☐ Other

DPAS : ☐ Not Rated ☐ Rated (Rating #) ☒ TBD

NAICS Code and Size Standard for the Subcontract or Purchase Order is: 541512 Computer Systems Design Services ($34M)