Insight Global

Remote Data Annotator Job at Insight Global in Chula Vista

Insight Global, Chula Vista, CA, United States

Title: AI Data Trainer

Location: remote

Duration: contract-to-hire

PR: $25/hr

Work Authorization: USC/GC; no subcontracting during the contract duration

Overview:

Large language models are core to our client, and the data we collect is core to the language models we train. While the current iteration of LLMs is trained primarily on web text, the next generation of LLMs will rely on human annotation to create custom datasets to further develop the capabilities of these models. We are looking for an AI Data Trainer to work closely with engineering and product teams to lead the creation of custom datasets for training specialized models to enable enterprise solutions using LLM's cutting-edge capabilities. This role requires a diverse set of skills and draws on a range of disciplines. We are therefore considering a broad range of backgrounds for this role, including ML, NLP, HCI, software engineering, and relevant linguistic and social sciences.

Key responsibilities:

Collaborate with Data Science and Product teams to define annotation tasks, coordinate resourcing, and review annotated data for quality
Develop and disseminate data labeling best practices learned from building enterprise solutions using LLMs
Develop labeled data assets according to annotation guides to train and evaluate LLMs in collaboration with Machine Learning Engineers for real-world use cases
Collaborate with centralized data and evaluation teams on specialized collection protocols, UIs, and instructions for diverse and creative human annotation tasks

Must Haves:

Bachelor's degree in Linguistics, Library Science, or a related field (open to non-traditional backgrounds as well!)
Experience with ontology development and information domain modeling
Experience labeling conversational text for analysis as AI trainers
Experience with AI interaction, such as prompt generation and open AIs
Experience running and managing human annotation jobs for large-scale data collection with quality control and best practices for human annotation
Proficiency with SQL, terminal, and command line
Proficiency with Jupyter notebooks
Ability to follow complex instructions, navigate ambiguity, and work independently
Detail-oriented disposition and clear, concise communication skills
Curiosity about technology and knack for tackling problems in creative ways

Plusses:

Proficiency in Japanese
Experience developing labeled data assets according to annotation guides to train and evaluate LLMs in collaboration with ML Engineers for real-world use cases
Experience collaborating with centralized data and evaluation teams on specialized collection protocols, UIs, and instructions for diverse and creative human annotation tasks

Compensation: $25.00/HR