Insight Global
AI Data Annotator
Insight Global, Stanford, California, 94305
Overview: Large language models are core to our client, and the data we collect is core to the language models we train. While the current iteration of LLMs is trained primarily on web text, the next generation of LLMs will rely on human annotation to create custom datasets to further develop the capabilities of these models. We are looking for an AI Data Trainer to work closely with engineering and product teams to lead the creation of custom datasets for training specialized models to enable enterprise solutions using LLM's cutting-edge capabilities. This role requires a diverse set of skills and draws on a range of disciplines. We are therefore considering a broad range of backgrounds for this role, including ML, NLP, HCI, software engineering, and relevant linguistic and social sciences. Key responsibilities: Collaborate with Data Science and Product teams to define annotation tasks, coordinate resourcing, and review annotated data for quality Develop and disseminate data labeling best practices learned from building enterprise solutions using LLMs Develop labeled data assets according to annotation guides to train and evaluate LLMs in collaboration with Machine Learning Engineers for real-world use cases Collaborate with centralized data and evaluation teams on specialized collection protocols, UIs, and instructions for diverse and creative human annotation tasks We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HRinsightglobal.com . To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/ . Required Skills & Experience Must Haves: Bachelors degree in Linguistics, Library Science, or a related field (open to non-traditional backgrounds as well) Experience with ontology development and information domain modeling Experience labeling conversational text for analysis as AI trainers Experience with AI interaction, such as prompt generation and open AIs Experience running and managing human annotation jobs for large-scale data collection with quality control and best practices for human annotation Proficiency with SQL, terminal, and command line Proficiency with Jupyter notebooks Ability to follow complex instructions, navigate ambiguity, and work independently Detail-oriented disposition and clear, concise communication skills Curiosity about technology and knack for tackling problems in creative ways Nice to Have Skills & Experience Plusses: Proficiency in Japanese Experience developing labeled data assets according to annotation guides to train and evaluate LLMs in collaboration with ML Engineers for real-world use cases Experience collaborating with centralized data and evaluation teams on specialized collection protocols, UIs, and instructions for diverse and creative human annotation tasks Benefit packages for this role will start on the 31st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.