JobRialto
Data Engineer
JobRialto, Union, Kentucky, United States, 41091
Job Title: Data Engineer
Job Summary
We are seeking a skilled Data Engineer to join our Data Science team. The ideal candidate will design, build, and maintain scalable data pipelines and infrastructure to support analytics, machine learning, and Retrieval-Augmented Generation (RAG) type Large Language Model (LLM) workflows. This role requires strong technical expertise, problem-solving skills, and the ability to collaborate effectively with data scientists, analysts, and other stakeholders.
Key Responsibilities
Data Pipeline Development:
•Design, develop, and maintain robust and scalable ETL (Extract, Transform, Load) processes.
•Ensure efficient and accurate data collection, processing, and storage.
Data Integration:
•Integrate data from various sources, including databases, APIs, and third-party providers.
•Maintain data consistency and integrity across different systems.
RAG Type LLM Workflows:
•Develop and maintain pipelines tailored for RAG-type LLM workflows.
•Ensure efficient data retrieval and augmentation for LLM training and inference.
•Collaborate with data scientists to optimize data pipelines for performance and accuracy.
Semantic/Ontology Data Layers:
•Develop and maintain semantic and ontology data layers to enhance integration and retrieval.
•Enrich data semantically to support analytics and machine learning models.
Collaboration:
•Work closely with data scientists, analysts, and other stakeholders to understand requirements.
•Provide technical support and guidance on data-related issues.
Data Quality and Governance:
•Implement data quality checks and validation processes.
•Adhere to data governance policies and best practices.
Performance Optimization:
•Monitor and optimize the performance of data pipelines and infrastructure.
•Troubleshoot and resolve data-related issues efficiently.
Support for Analysis:
•Provide quick, reliable data access for ad-hoc analysis.
•Contribute to longer-term goals with scalable and maintainable data solutions.
Documentation:
•Maintain detailed documentation of data pipelines, processes, and infrastructure.
•Ensure knowledge transfer and continuity within the team.
Required Qualifications
•Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
•3+ years of experience in data engineering or a related role.
•Proficiency in Python (mandatory).
•Experience with SQL and NoSQL databases (e.g., MySQL, PostgreSQL, MongoDB).
•Familiarity with big data technologies (e.g., Hadoop, Spark, Kafka).
•Experience with cloud platforms (e.g., AWS, Azure, Google Cloud) and their data services.
•Experience building and maintaining data pipelines for LLM workflows, including data retrieval and augmentation.
•Familiarity with natural language processing (NLP) techniques and tools.
•Understanding of semantic and ontology data layers for integration and retrieval.
•Hands-on experience with ETL tools (e.g., Apache NiFi, Airflow, Talend).
•Strong analytical and problem-solving skills.
•Excellent communication and collaboration abilities.
Preferred Qualifications
•Experience with machine learning and data science workflows.
•Knowledge of data governance and compliance standards.
•Familiarity with data visualization tools (e.g., Tableau, Power BI).
•Certification in cloud platforms or data engineering.
Education:
Bachelors Degree
Certification:
IBM Data Engineering
Job Summary
We are seeking a skilled Data Engineer to join our Data Science team. The ideal candidate will design, build, and maintain scalable data pipelines and infrastructure to support analytics, machine learning, and Retrieval-Augmented Generation (RAG) type Large Language Model (LLM) workflows. This role requires strong technical expertise, problem-solving skills, and the ability to collaborate effectively with data scientists, analysts, and other stakeholders.
Key Responsibilities
Data Pipeline Development:
•Design, develop, and maintain robust and scalable ETL (Extract, Transform, Load) processes.
•Ensure efficient and accurate data collection, processing, and storage.
Data Integration:
•Integrate data from various sources, including databases, APIs, and third-party providers.
•Maintain data consistency and integrity across different systems.
RAG Type LLM Workflows:
•Develop and maintain pipelines tailored for RAG-type LLM workflows.
•Ensure efficient data retrieval and augmentation for LLM training and inference.
•Collaborate with data scientists to optimize data pipelines for performance and accuracy.
Semantic/Ontology Data Layers:
•Develop and maintain semantic and ontology data layers to enhance integration and retrieval.
•Enrich data semantically to support analytics and machine learning models.
Collaboration:
•Work closely with data scientists, analysts, and other stakeholders to understand requirements.
•Provide technical support and guidance on data-related issues.
Data Quality and Governance:
•Implement data quality checks and validation processes.
•Adhere to data governance policies and best practices.
Performance Optimization:
•Monitor and optimize the performance of data pipelines and infrastructure.
•Troubleshoot and resolve data-related issues efficiently.
Support for Analysis:
•Provide quick, reliable data access for ad-hoc analysis.
•Contribute to longer-term goals with scalable and maintainable data solutions.
Documentation:
•Maintain detailed documentation of data pipelines, processes, and infrastructure.
•Ensure knowledge transfer and continuity within the team.
Required Qualifications
•Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
•3+ years of experience in data engineering or a related role.
•Proficiency in Python (mandatory).
•Experience with SQL and NoSQL databases (e.g., MySQL, PostgreSQL, MongoDB).
•Familiarity with big data technologies (e.g., Hadoop, Spark, Kafka).
•Experience with cloud platforms (e.g., AWS, Azure, Google Cloud) and their data services.
•Experience building and maintaining data pipelines for LLM workflows, including data retrieval and augmentation.
•Familiarity with natural language processing (NLP) techniques and tools.
•Understanding of semantic and ontology data layers for integration and retrieval.
•Hands-on experience with ETL tools (e.g., Apache NiFi, Airflow, Talend).
•Strong analytical and problem-solving skills.
•Excellent communication and collaboration abilities.
Preferred Qualifications
•Experience with machine learning and data science workflows.
•Knowledge of data governance and compliance standards.
•Familiarity with data visualization tools (e.g., Tableau, Power BI).
•Certification in cloud platforms or data engineering.
Education:
Bachelors Degree
Certification:
IBM Data Engineering