Logo
Harnham

Lead Data Engineer

Harnham, Houston, TX, United States


Title: Lead Data Engineer

Location: Houston, TX 77077 - Onsite 4 days per week.

Overview of Role:

As Lead Data Engineer, you will be responsible for helping to scale, implement, and architect new AI/ML initiatives at the enterprise level. This role is part of a brand new team as our client's business grows. This role is a combination of systems engineering, data analytics, and data integration.

Company Description:

Our client manages a large portfolio of companies and has a large focus on upholding integrity and being held accountable to the highest ethical standards. They are committed to fostering a culture of open communication, teamwork, and personal development.

Role Description:

  • Design and implement scalable, reliable data pipelines using technologies like Apache Spark, Hadoop, and Kafka.
  • Leverage AWS or Azure services (e.g., EC2, RDS, S3, Lambda, Azure Data Lake) for efficient data handling and processing.
  • Develop and optimize data models and storage solutions (SQL, NoSQL, Data Lakes) to ensure data quality and accessibility for operational and analytical applications.
  • Automate data workflows with ETL tools and frameworks (e.g., Apache Airflow, Talend) for efficient data integration and timely availability.
  • Collaborate with data scientists, providing the necessary infrastructure and tools for complex analytical models, using Python or R.
  • Ensure data governance and security compliance, implementing best practices in encryption, masking, and access controls within a cloud environment.

Skills and Experience:

  • Bachelor's degree in Computer Science, MIS, or equivalent education/experience.
  • Extensive background and experience in ETL/ELT pipelining.
  • 3+ years of big data technology (Hadoop, Spark, Kafka), as well as cloud services (AWS preferred, Azure, GCP) in a storage/processing capacity.
  • Experience with cloud computing environments (AWS, Azure, GCP) and Data/ML platforms (Databricks, Spark).
  • Relational database management system Software (RDBMS) experience - e.g. PostGreSQL
  • ML model deployment experience is a plus!
  • Nice to have certifications: AWS Certified Solutions Architect, Azure Data Engineer Associate, Databricks Certified Associate Developer for Apache Spark.

Benefits:

  • Health, Dental, and Vision Insurance
  • 15+ Days of PTO Annually
  • Educational Assistance available
  • Annual bonus (20%)
  • 401(k) Matching Program

Please note: Candidates must be authorized to work in the United States to be considered at this time.