JobRialto

Data Engineer

JobRialto, Cincinnati, Ohio, United States, 45208

Job Title: Data Engineer

Job Summary

We are seeking a skilled Data Engineer to join our Data Science team. The ideal candidate will design, build, and maintain scalable data pipelines and infrastructure to support analytics, machine learning, and Retrieval-Augmented Generation (RAG) type Large Language Model (LLM) workflows. This role requires strong technical expertise, problem-solving skills, and the ability to collaborate effectively with data scientists, analysts, and other stakeholders.

Key Responsibilities

Data Pipeline Development:

• Design, develop, and maintain robust and scalable ETL (Extract, Transform, Load) processes.

• Ensure efficient and accurate data collection, processing, and storage.

Data Integration:

• Integrate data from various sources, including databases, APIs, and third-party providers.

• Maintain data consistency and integrity across different systems.

RAG Type LLM Workflows:

• Develop and maintain pipelines tailored for RAG-type LLM workflows.

• Ensure efficient data retrieval and augmentation for LLM training and inference.

• Collaborate with data scientists to optimize data pipelines for performance and accuracy.

Semantic/Ontology Data Layers:

• Develop and maintain semantic and ontology data layers to enhance integration and retrieval.

• Enrich data semantically to support analytics and machine learning models.

Collaboration:

• Work closely with data scientists, analysts, and other stakeholders to understand requirements.

• Provide technical support and guidance on data-related issues.

Data Quality and Governance:

• Implement data quality checks and validation processes.

• Adhere to data governance policies and best practices.

Performance Optimization:

• Monitor and optimize the performance of data pipelines and infrastructure.

• Troubleshoot and resolve data-related issues efficiently.

Support for Analysis:

• Provide quick, reliable data access for ad-hoc analysis.

• Contribute to longer-term goals with scalable and maintainable data solutions.

Documentation:

• Maintain detailed documentation of data pipelines, processes, and infrastructure.

• Ensure knowledge transfer and continuity within the team.

Required Qualifications

• Bachelor's or Master's degree in Computer Science, Engineering, or a related field.

• 3+ years of experience in data engineering or a related role.

• Proficiency in Python (mandatory).

• Experience with SQL and NoSQL databases (e.g., MySQL, PostgreSQL, MongoDB).

• Familiarity with big data technologies (e.g., Hadoop, Spark, Kafka).

• Experience with cloud platforms (e.g., AWS, Azure, Google Cloud) and their data services.

• Experience building and maintaining data pipelines for LLM workflows, including data retrieval and augmentation.

• Familiarity with natural language processing (NLP) techniques and tools.

• Understanding of semantic and ontology data layers for integration and retrieval.

• Hands-on experience with ETL tools (e.g., Apache NiFi, Airflow, Talend).

• Strong analytical and problem-solving skills.

• Excellent communication and collaboration abilities.

Preferred Qualifications

• Experience with machine learning and data science workflows.

• Knowledge of data governance and compliance standards.

• Familiarity with data visualization tools (e.g., Tableau, Power BI).

• Certification in cloud platforms or data engineering.

Education:

Bachelors Degree

Certification:

IBM Data Engineering