JobRialto
Data Engineer
JobRialto, Cincinnati, Ohio, United States, 45208
Job Title: Data Engineer
Job Summary
We are seeking a skilled Data Engineer to join our Data Science team. The ideal candidate will design, build, and maintain scalable data pipelines and infrastructure to support analytics, machine learning, and Retrieval-Augmented Generation (RAG) type Large Language Model (LLM) workflows. This role requires strong technical expertise, problem-solving skills, and the ability to collaborate effectively with data scientists, analysts, and other stakeholders.
Key Responsibilities
Data Pipeline Development:
• Design, develop, and maintain robust and scalable ETL (Extract, Transform, Load) processes.
• Ensure efficient and accurate data collection, processing, and storage.
Data Integration:
• Integrate data from various sources, including databases, APIs, and third-party providers.
• Maintain data consistency and integrity across different systems.
RAG Type LLM Workflows:
• Develop and maintain pipelines tailored for RAG-type LLM workflows.
• Ensure efficient data retrieval and augmentation for LLM training and inference.
• Collaborate with data scientists to optimize data pipelines for performance and accuracy.
Semantic/Ontology Data Layers:
• Develop and maintain semantic and ontology data layers to enhance integration and retrieval.
• Enrich data semantically to support analytics and machine learning models.
Collaboration:
• Work closely with data scientists, analysts, and other stakeholders to understand requirements.
• Provide technical support and guidance on data-related issues.
Data Quality and Governance:
• Implement data quality checks and validation processes.
• Adhere to data governance policies and best practices.
Performance Optimization:
• Monitor and optimize the performance of data pipelines and infrastructure.
• Troubleshoot and resolve data-related issues efficiently.
Support for Analysis:
• Provide quick, reliable data access for ad-hoc analysis.
• Contribute to longer-term goals with scalable and maintainable data solutions.
Documentation:
• Maintain detailed documentation of data pipelines, processes, and infrastructure.
• Ensure knowledge transfer and continuity within the team.
Required Qualifications
• Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
• 3+ years of experience in data engineering or a related role.
• Proficiency in Python (mandatory).
• Experience with SQL and NoSQL databases (e.g., MySQL, PostgreSQL, MongoDB).
• Familiarity with big data technologies (e.g., Hadoop, Spark, Kafka).
• Experience with cloud platforms (e.g., AWS, Azure, Google Cloud) and their data services.
• Experience building and maintaining data pipelines for LLM workflows, including data retrieval and augmentation.
• Familiarity with natural language processing (NLP) techniques and tools.
• Understanding of semantic and ontology data layers for integration and retrieval.
• Hands-on experience with ETL tools (e.g., Apache NiFi, Airflow, Talend).
• Strong analytical and problem-solving skills.
• Excellent communication and collaboration abilities.
Preferred Qualifications
• Experience with machine learning and data science workflows.
• Knowledge of data governance and compliance standards.
• Familiarity with data visualization tools (e.g., Tableau, Power BI).
• Certification in cloud platforms or data engineering.
Education:
Bachelors Degree
Certification:
IBM Data Engineering
Job Summary
We are seeking a skilled Data Engineer to join our Data Science team. The ideal candidate will design, build, and maintain scalable data pipelines and infrastructure to support analytics, machine learning, and Retrieval-Augmented Generation (RAG) type Large Language Model (LLM) workflows. This role requires strong technical expertise, problem-solving skills, and the ability to collaborate effectively with data scientists, analysts, and other stakeholders.
Key Responsibilities
Data Pipeline Development:
• Design, develop, and maintain robust and scalable ETL (Extract, Transform, Load) processes.
• Ensure efficient and accurate data collection, processing, and storage.
Data Integration:
• Integrate data from various sources, including databases, APIs, and third-party providers.
• Maintain data consistency and integrity across different systems.
RAG Type LLM Workflows:
• Develop and maintain pipelines tailored for RAG-type LLM workflows.
• Ensure efficient data retrieval and augmentation for LLM training and inference.
• Collaborate with data scientists to optimize data pipelines for performance and accuracy.
Semantic/Ontology Data Layers:
• Develop and maintain semantic and ontology data layers to enhance integration and retrieval.
• Enrich data semantically to support analytics and machine learning models.
Collaboration:
• Work closely with data scientists, analysts, and other stakeholders to understand requirements.
• Provide technical support and guidance on data-related issues.
Data Quality and Governance:
• Implement data quality checks and validation processes.
• Adhere to data governance policies and best practices.
Performance Optimization:
• Monitor and optimize the performance of data pipelines and infrastructure.
• Troubleshoot and resolve data-related issues efficiently.
Support for Analysis:
• Provide quick, reliable data access for ad-hoc analysis.
• Contribute to longer-term goals with scalable and maintainable data solutions.
Documentation:
• Maintain detailed documentation of data pipelines, processes, and infrastructure.
• Ensure knowledge transfer and continuity within the team.
Required Qualifications
• Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
• 3+ years of experience in data engineering or a related role.
• Proficiency in Python (mandatory).
• Experience with SQL and NoSQL databases (e.g., MySQL, PostgreSQL, MongoDB).
• Familiarity with big data technologies (e.g., Hadoop, Spark, Kafka).
• Experience with cloud platforms (e.g., AWS, Azure, Google Cloud) and their data services.
• Experience building and maintaining data pipelines for LLM workflows, including data retrieval and augmentation.
• Familiarity with natural language processing (NLP) techniques and tools.
• Understanding of semantic and ontology data layers for integration and retrieval.
• Hands-on experience with ETL tools (e.g., Apache NiFi, Airflow, Talend).
• Strong analytical and problem-solving skills.
• Excellent communication and collaboration abilities.
Preferred Qualifications
• Experience with machine learning and data science workflows.
• Knowledge of data governance and compliance standards.
• Familiarity with data visualization tools (e.g., Tableau, Power BI).
• Certification in cloud platforms or data engineering.
Education:
Bachelors Degree
Certification:
IBM Data Engineering