Lead Data Engineer
Harnham, Houston, TX, United States
Title: Lead Data Engineer
Location: Houston, TX 77077 - Onsite 4 days per week.
Overview of Role:
As Lead Data Engineer, you will be responsible for helping to scale, implement, and architect new AI/ML initiatives at the enterprise level. This role is part of a brand new team as our client's business grows. This role is a combination of systems engineering, data analytics, and data integration.
Company Description:
Our client manages a large portfolio of companies and has a large focus on upholding integrity and being held accountable to the highest ethical standards. They are committed to fostering a culture of open communication, teamwork, and personal development.
Role Description:
- Design and implement scalable, reliable data pipelines using technologies like Apache Spark, Hadoop, and Kafka.
- Leverage AWS or Azure services (e.g., EC2, RDS, S3, Lambda, Azure Data Lake) for efficient data handling and processing.
- Develop and optimize data models and storage solutions (SQL, NoSQL, Data Lakes) to ensure data quality and accessibility for operational and analytical applications.
- Automate data workflows with ETL tools and frameworks (e.g., Apache Airflow, Talend) for efficient data integration and timely availability.
- Collaborate with data scientists, providing the necessary infrastructure and tools for complex analytical models, using Python or R.
- Ensure data governance and security compliance, implementing best practices in encryption, masking, and access controls within a cloud environment.
Skills and Experience:
- Bachelor's degree in Computer Science, MIS, or equivalent education/experience.
- Extensive background and experience in ETL/ELT pipelining.
- 3+ years of big data technology (Hadoop, Spark, Kafka), as well as cloud services (AWS preferred, Azure, GCP) in a storage/processing capacity.
- Experience with cloud computing environments (AWS, Azure, GCP) and Data/ML platforms (Databricks, Spark).
- Relational database management system Software (RDBMS) experience - e.g. PostGreSQL
- ML model deployment experience is a plus!
- Nice to have certifications: AWS Certified Solutions Architect, Azure Data Engineer Associate, Databricks Certified Associate Developer for Apache Spark.
Benefits:
- Health, Dental, and Vision Insurance
- 15+ Days of PTO Annually
- Educational Assistance available
- Annual bonus (20%)
- 401(k) Matching Program
Please note: Candidates must be authorized to work in the United States to be considered at this time.