Insight Global

Remote Data Annotator

Insight Global, Raleigh, North Carolina, United States, 27601

Title: AI Data Trainer Location: remote Duration: contract-to-hire PR: $25/hr Work Authorization : USC/GC; no subcontracting during the contract duration Overview: Large language models are core to our client, and the data we collect is core to the language models we train. While the current iteration of LLMs is trained primarily on web text, the next generation of LLMs will rely on human annotation to create custom datasets to further develop the capabilities of these models. We are looking for an AI Data Trainer to work closely with engineering and product teams to lead the creation of custom datasets for training specialized models to enable enterprise solutions using LLM's cutting-edge capabilities. This role requires a diverse set of skills and draws on a range of disciplines. We are therefore considering a broad range of backgrounds for this role, including ML, NLP, HCI, software engineering, and relevant linguistic and social sciences. Key responsibilities: Collaborate with Data Science and Product teams to define annotation tasks, coordinate resourcing, and review annotated data for quality Develop and disseminate data labeling best practices learned from building enterprise solutions using LLMs Develop labeled data assets according to annotation guides to train and evaluate LLMs in collaboration with Machine Learning Engineers for real-world use cases Collaborate with centralized data and evaluation teams on specialized collection protocols, UIs, and instructions for diverse and creative human annotation tasks Must Haves: Bachelor's degree in Linguistics, Library Science, or a related field (open to non-traditional backgrounds as well) Experience with ontology development and information domain modeling Experience labeling conversational text for analysis as AI trainers Experience with AI interaction, such as prompt generation and open AIs Experience running and managing human annotation jobs for large-scale data collection with quality control and best practices for human annotation Proficiency with SQL, terminal, and command line Proficiency with Jupyter notebooks Ability to follow complex instructions, navigate ambiguity, and work independently Detail-oriented disposition and clear, concise communication skills Curiosity about technology and knack for tackling problems in creative ways Plusses: Proficiency in Japanese Experience developing labeled data assets according to annotation guides to train and evaluate LLMs in collaboration with ML Engineers for real-world use cases Experience collaborating with centralized data and evaluation teams on specialized collection protocols, UIs, and instructions for diverse and creative human annotation tasks Compensation: $25.00/HR