Logo
Understanding Recruitment Inc

Software Engineer - Pretraining Data

Understanding Recruitment Inc, Los Gatos, California, United States, 95032


Software Engineer - Pretraining DataIntroduction:

We are on a mission to build safe AGI that accelerates humanity’s progress on critical global challenges. Our strategy leverages frontier-scale pre-training, domain-specific RL, ultra-long context, and test-time compute. If you're a Software Engineer passionate about pretraining data and creating efficient, robust data pipelines, this role is for you.About the Company:

Our organization is dedicated to automating research and code generation to improve models and solve alignment issues more effectively than humans alone. We focus on high-quality data processing and innovative solutions, contributing to significant advancements in AI and AGI safety.About the Role:

As a Software Engineer specializing in pretraining data, you will develop and optimize web scraping techniques to handle massive, multimodal datasets. Your expertise will be crucial in building and maintaining data pipelines that support our advanced AI models.What We Can Offer You:Significant equity component401(k) plan with 6% matchingComprehensive health, dental, and vision insurance for you and your dependentsUnlimited paid time offFlexibility to work in-person in San Francisco or remotelyVisa sponsorship and relocation stipend availableKey Responsibilities:Design and implement multimodal web crawlers for large-scale data collectionDevelop and maintain large-scale data processing pipelines using tools like Ray, Apache Spark, and Google BigQueryImplement deduplication techniques across multiple data modalitiesApply heuristic and model-based techniques for parsing and filtering dataIdentify and integrate new data sources into pre/post-training datasetsJoin us to shape the future of AGI by contributing to our innovative approach to data processing and AI model improvement. Your skills as a Software Engineer in pretraining data will drive our mission forward.

#J-18808-Ljbffr