liquid.ai
Member of Technical Staff - ML Research Engineer; Data Generation
liquid.ai, San Francisco, California, United States, 94199
Liquid AI, an MIT spin-off, is a foundation model company headquartered in Boston, Massachusetts. Our mission is to build capable and efficient general-purpose AI systems at every scale.
Our goal at Liquid is to build the most capable AI systems to solve problems at every scale, such that users can build, access, and control their AI solutions. This is to ensure that AI will get meaningfully, reliably and efficiently integrated at all enterprises. Long term, Liquid will create and deploy frontier-AI-powered solutions that are available to everyone.
We are seeking a highly skilled ML Engineer to play a critical role in our foundation model development process. The ideal candidate will be responsible for designing, developing, and implementing sophisticated synthetic and real-world data generation strategies that will feed and improve our AI model's training pipeline.
Key Responsibilities
Design and implement comprehensive data generation strategies for foundation model training
Develop synthetic data generation techniques that enhance model performance and diversity
Curate, clean, and validate large-scale real-world datasets
Create advanced data augmentation and transformation pipelines
Ensure data quality, ethical considerations, and bias mitigation in data generation
Develop tools and frameworks for reproducible and scalable data generation
Monitor and assess the impact of generated data on model performance
Required Qualifications
Ph.D. or Master's degree in Computer Science, Machine Learning, Statistics, or related field
Experience in data generation, synthetic data creation, or machine learning data pipelines
Strong programming skills
Experience with machine learning frameworks, ideally Pytorch
Deep understanding of generative AI techniques
Expertise in data augmentation, transformation, and cleaning methodologies
Strong statistical and mathematical background
Preferred Skills
Experience with large language models or multimodal foundation models
Knowledge of differential privacy and data anonymization techniques
Experience with data ethics and bias detection
Publications or research in synthetic data generation
Understanding of scalable data processing architectures
projects
#J-18808-Ljbffr
#J-18808-Ljbffr