JobRialto

Data Engineer

JobRialto, Cincinnati, Ohio, United States, 45208

Job Summary:

We are seeking a highly skilled Data Engineer specializing in unstructured data processing to enhance our enterprise data governance framework. This role involves designing and implementing scalable workflows to transform unstructured data into valuable assets while integrating their metadata into cataloging systems like Informatica CDGC. The ideal candidate combines hands-on technical expertise with innovative problem-solving skills in parsing, transformation, and metadata management.

Key Responsibilities:

• Extract metadata from unstructured sources such as documents, multimedia, and log files and transform it into structured formats for cataloging in tools like Informatica CDGC.

• Build pipelines to parse complex data formats, including XML, JSON, and other semi-structured/unstructured formats.

• Apply advanced processing techniques to standardize and classify unstructured data for enterprise applications.

• Define and document lineage for unstructured data assets, linking them to structured enterprise datasets.

• Collaborate with governance teams to ensure compliance with metadata quality standards.

• Design and maintain efficient, scalable, and resilient pipelines for high-volume unstructured data ingestion.

• Optimize workflows to reduce latency and enhance metadata processing accuracy.

• Work with catalog engineers, data architects, and governance stakeholders to implement business-aligned solutions.

• Provide technical documentation and share knowledge to support team collaboration.

Required Qualifications:

• Hands-on experience in developing data pipelines for unstructured data.

• Strong programming skills in Java (preferred) or Python, with expertise in text processing and data parsing.

• Proficiency in tools and frameworks for data extraction and integration into metadata systems.

• In-depth understanding of metadata management and data lineage principles.

• Demonstrated problem-solving skills in addressing unstructured data challenges.

Preferred Qualifications:

• Experience with metadata cataloging platforms like Informatica CDGC.

• Exposure to document processing techniques, such as OCR and natural language processing (NLP).

• Familiarity with cloud ecosystems like AWS, Azure, or Google Cloud for managing unstructured data.

Education:

Bachelors Degree