Apex Systems
Data Engineer
Apex Systems, Cincinnati, Ohio, United States, 45208
Job#: 2051473
Job Description:
Apex Systems is looking for a Data Engineer for one of our large clients in Cincinnati, OH. If you are interested in any Data Engineer opportunities, please email Molly Biggers at [email protected]. We are seeking a skilled Data Engineer to join our Data Science team. The ideal candidate will be responsible for designing, building, and maintaining scalable data pipelines and infrastructure to support data analytics, machine learning, and Retrieval-Augmented Generation (RAG) type Large Language Model (LLM) workflows. This role requires a strong technical background, excellent problem-solving skills, and the ability to work collaboratively with data scientists, analysts, and other stakeholders.
Key Responsibilities:
Data Pipeline Development:
Design, develop, and maintain robust and scalable ETL (Extract, Transform, Load) processes.Ensure data is collected, processed, and stored efficiently and accurately.
Data Integration:
Integrate data from various sources, including databases, APIs, and third-party data providers.Ensure data consistency and integrity across different systems.
RAG Type LLM Workflows:
Develop and maintain data pipelines specifically tailored for Retrieval-Augmented Generation (RAG) type Large Language Model (LLM) workflows.Ensure efficient data retrieval and augmentation processes to support LLM training and inference.Collaborate with data scientists to optimize data pipelines for LLM performance and accuracy.
Semantic/Ontology Data Layers:
Develop and maintain semantic and ontology data layers to enhance data integration and retrieval.Ensure data is semantically enriched to support advanced analytics and machine learning models.
Collaboration:
Work closely with data scientists, analysts, and other stakeholders to understand data requirements and deliver solutions.Provide technical support and guidance on data-related issues.
Data Quality and Governance:
Implement data quality checks and validation processes to ensure data accuracy and reliability.Adhere to data governance policies and best practices.
Performance Optimization:
Monitor and optimize the performance of data pipelines and infrastructure.Troubleshoot and resolve data-related issues in a timely manner.
Support for Analysis:
Support short-term ad-hoc analysis by providing quick and reliable data access.Contribute to longer-term goals by developing scalable and maintainable data solutions.
Documentation:
Maintain comprehensive documentation of data pipelines, processes, and infrastructure.Ensure knowledge transfer and continuity within the team.
Technical Requirements:
Education and Experience:
Bachelor's or Master's degree in Computer Science, Engineering, or a related field.3+ years of experience in data engineering or a related role.
Technical Skills:
Proficiency in Python (mandatory).Experience with other programming languages such as Java or Scala is a plus.Experience with SQL and NoSQL databases (e.g., MySQL, PostgreSQL, MongoDB).Familiarity with big data technologies (e.g., Hadoop, Spark, Kafka).Experience with cloud platforms (e.g., AWS, Azure, Google Cloud) and their data services.
RAG Type LLM Skills:
Job Description:
Apex Systems is looking for a Data Engineer for one of our large clients in Cincinnati, OH. If you are interested in any Data Engineer opportunities, please email Molly Biggers at [email protected]. We are seeking a skilled Data Engineer to join our Data Science team. The ideal candidate will be responsible for designing, building, and maintaining scalable data pipelines and infrastructure to support data analytics, machine learning, and Retrieval-Augmented Generation (RAG) type Large Language Model (LLM) workflows. This role requires a strong technical background, excellent problem-solving skills, and the ability to work collaboratively with data scientists, analysts, and other stakeholders.
Key Responsibilities:
Data Pipeline Development:
Design, develop, and maintain robust and scalable ETL (Extract, Transform, Load) processes.Ensure data is collected, processed, and stored efficiently and accurately.
Data Integration:
Integrate data from various sources, including databases, APIs, and third-party data providers.Ensure data consistency and integrity across different systems.
RAG Type LLM Workflows:
Develop and maintain data pipelines specifically tailored for Retrieval-Augmented Generation (RAG) type Large Language Model (LLM) workflows.Ensure efficient data retrieval and augmentation processes to support LLM training and inference.Collaborate with data scientists to optimize data pipelines for LLM performance and accuracy.
Semantic/Ontology Data Layers:
Develop and maintain semantic and ontology data layers to enhance data integration and retrieval.Ensure data is semantically enriched to support advanced analytics and machine learning models.
Collaboration:
Work closely with data scientists, analysts, and other stakeholders to understand data requirements and deliver solutions.Provide technical support and guidance on data-related issues.
Data Quality and Governance:
Implement data quality checks and validation processes to ensure data accuracy and reliability.Adhere to data governance policies and best practices.
Performance Optimization:
Monitor and optimize the performance of data pipelines and infrastructure.Troubleshoot and resolve data-related issues in a timely manner.
Support for Analysis:
Support short-term ad-hoc analysis by providing quick and reliable data access.Contribute to longer-term goals by developing scalable and maintainable data solutions.
Documentation:
Maintain comprehensive documentation of data pipelines, processes, and infrastructure.Ensure knowledge transfer and continuity within the team.
Technical Requirements:
Education and Experience:
Bachelor's or Master's degree in Computer Science, Engineering, or a related field.3+ years of experience in data engineering or a related role.
Technical Skills:
Proficiency in Python (mandatory).Experience with other programming languages such as Java or Scala is a plus.Experience with SQL and NoSQL databases (e.g., MySQL, PostgreSQL, MongoDB).Familiarity with big data technologies (e.g., Hadoop, Spark, Kafka).Experience with cloud platforms (e.g., AWS, Azure, Google Cloud) and their data services.
RAG Type LLM Skills: