Logo
Apex Systems

Data Engineer

Apex Systems, Cincinnati, Ohio, 45208


Job: 2055521 Job Description: Apex Systems is looking for a Data Engineer for one of our large clients in Cincinnati, OH. Job Title: Data Engineer Role Overview: We are looking for a Data Engineer with a focus on processing unstructured data to enhance our enterprise data governance framework. You will design and implement scalable workflows that transform unstructured data sources into valuable assets, ensuring their metadata is integrated into our cataloging systems. This role combines hands-on engineering with problem-solving, requiring an understanding of parsing, transformation, and metadata management techniques. Key Responsibilities: Use Informatica CDGC and build Metadata Extraction and Integration: o Build workflows to extract metadata from unstructured sources such as documents, multimedia, and log files. o Transform raw metadata into a structured format for cataloging in tools like Informatica CDGC. Data Transformation and Parsing: o Develop pipelines to parse complex data formats, including XML, JSON, and other semistructured/unstructured formats. o Apply advanced processing techniques to standardize and classify unstructured data for enterprise use. Lineage and Data Governance: o Contribute to defining and documenting lineage for unstructured data assets, linking them to structured enterprise datasets. o Work with governance teams to ensure compliance with metadata quality standards and practices. Pipeline Development and Optimization: o Design and maintain data pipelines that are efficient, scalable, and resilient for high volume unstructured data ingestion. o Optimize workflows to minimize latency and maximize accuracy in metadata processing. Collaboration and Support: o Partner with catalog engineers, data architects, and governance stakeholders to implement solutions aligned with business goals. o Provide technical documentation and support to ensure ease of use and knowledge sharing within the team. Qualifications: Technical Expertise: o Hands-on experience with data pipelines and transformations for unstructured data. o Strong programming skills in languages such as Python, Java, or Scala, with an emphasis on text processing and data parsing. o Familiarity with tools and frameworks for data extraction, transformation, and integration into metadata systems. Data Governance Knowledge: o Understanding of metadata management principles and techniques, especially in enterprise environments. o Knowledge of data lineage and its role in ensuring traceability for unstructured assets. Problem-Solving and Innovation: o Creative thinking in addressing unstructured data challenges, such as extracting metadata from diverse formats. o Proven ability to troubleshoot complex workflows and ensure robust pipeline performance. Preferred Skills: Experience with metadata cataloging platforms, including Informatica CDGC or similar tools. Exposure to document processing techniques such as OCR and natural language processing (NLP). Familiarity with cloud data ecosystems like AWS, Azure, or Google Cloud for managing unstructured data EEO Employer Apex Systems is an equal opportunity employer. We do not discriminate or allow discrimination on the basis of race, color, religion, creed, sex (including pregnancy, childbirth, breastfeeding, or related medical conditions), age, sexual orientation, gender identity, national origin, ancestry, citizenship, genetic information, registered domestic partner status, marital status, disability, status as a crime victim, protected veteran status, political affiliation, union membership, or any other characteristic protected by law. Apex will consider qualified applicants with criminal histories in a manner consistent with the requirements of applicable law. Apex Systems is a world-class I