ClearpointCo.
Lead Data Engineer
ClearpointCo., Houston, Texas, United States, 77246
TITLE:
Lead Data Engineer
LOCATION:
Houston Texas
TYPE:
Direct Hire
SALARY:
$220,000 - $240,000
SUMMARY:
The Lead Data Engineer will play a crucial role in architecting, implementing, and managing robust, scalable data infrastructure. This position demands a blend of systems engineering, data integration, and data analytics skills to enhance data capabilities, supporting advanced analytics, machine learning projects, and real-time data processing needs.
DUTIES:
Design and implement scalable and reliable data pipelines to ingest, process, and store diverse data at scale, using technologies such as Apache Spark, Hadoop, and Kafka.
Work within cloud environments like AWS or Azure to leverage services including but not limited to EC2, RDS, S3, Lambda, and Azure Data Lake for efficient data handling and processing.
Develop and optimize data models and storage solutions (SQL, NoSQL, Data Lakes) to support operational and analytical applications, ensuring data quality and accessibility.
Utilize ETL tools and frameworks (e.g., Apache Airflow, Talend) to automate data workflows, ensuring efficient data integration and timely availability of data for analytics.
Collaborate closely with data scientists, providing the data infrastructure and tools needed for complex analytical models, leveraging Python or R for data processing scripts.
Ensure compliance with data governance and security policies, implementing best practices in data encryption, masking, and access controls within a cloud environment.
Monitor and troubleshoot data pipelines and databases for performance issues, applying tuning techniques to optimize data access and throughput.
Stay abreast of emerging technologies and methodologies in data engineering, advocating for and implementing improvements to the data ecosystem.
REQUIREMENTS:
7+ years of experience in data engineering, with a proven track record in designing and operating large-scale data pipelines and architectures.
Expertise in developing ETL/ELT workflows.
Comprehensive knowledge of platforms and services like Databricks, Dataiku, and AWS native data offerings.
Solid experience with big data technologies (Apache Spark, Hadoop, Kafka) and cloud services (AWS, Azure) related to data processing and storage.
Strong experience in AWS and Azure cloud services, with hands-on experience in integrating cloud storage and compute services with Databricks.
Proficient in SQL and programming languages relevant to data engineering (Python, Java, Scala).
Hands-on RDBMS experience (data modeling, analysis, programming, stored procedures).
Familiarity with machine learning model deployment and management practices is a plus.
Strong communication skills, capable of collaborating effectively across technical and non-technical teams.
EDUCATION:
Bachelor's Degree in computer science, MIS, or other business discipline.
Master's Degree in computer science, MIS, or other business discipline preferred.
AWS Certified Solution Architect Preferred.
Databricks Certified Associate Developer for Apache Spark Preferred.
Azure Data Engineer Associate Preferred.
#J-18808-Ljbffr
Lead Data Engineer
LOCATION:
Houston Texas
TYPE:
Direct Hire
SALARY:
$220,000 - $240,000
SUMMARY:
The Lead Data Engineer will play a crucial role in architecting, implementing, and managing robust, scalable data infrastructure. This position demands a blend of systems engineering, data integration, and data analytics skills to enhance data capabilities, supporting advanced analytics, machine learning projects, and real-time data processing needs.
DUTIES:
Design and implement scalable and reliable data pipelines to ingest, process, and store diverse data at scale, using technologies such as Apache Spark, Hadoop, and Kafka.
Work within cloud environments like AWS or Azure to leverage services including but not limited to EC2, RDS, S3, Lambda, and Azure Data Lake for efficient data handling and processing.
Develop and optimize data models and storage solutions (SQL, NoSQL, Data Lakes) to support operational and analytical applications, ensuring data quality and accessibility.
Utilize ETL tools and frameworks (e.g., Apache Airflow, Talend) to automate data workflows, ensuring efficient data integration and timely availability of data for analytics.
Collaborate closely with data scientists, providing the data infrastructure and tools needed for complex analytical models, leveraging Python or R for data processing scripts.
Ensure compliance with data governance and security policies, implementing best practices in data encryption, masking, and access controls within a cloud environment.
Monitor and troubleshoot data pipelines and databases for performance issues, applying tuning techniques to optimize data access and throughput.
Stay abreast of emerging technologies and methodologies in data engineering, advocating for and implementing improvements to the data ecosystem.
REQUIREMENTS:
7+ years of experience in data engineering, with a proven track record in designing and operating large-scale data pipelines and architectures.
Expertise in developing ETL/ELT workflows.
Comprehensive knowledge of platforms and services like Databricks, Dataiku, and AWS native data offerings.
Solid experience with big data technologies (Apache Spark, Hadoop, Kafka) and cloud services (AWS, Azure) related to data processing and storage.
Strong experience in AWS and Azure cloud services, with hands-on experience in integrating cloud storage and compute services with Databricks.
Proficient in SQL and programming languages relevant to data engineering (Python, Java, Scala).
Hands-on RDBMS experience (data modeling, analysis, programming, stored procedures).
Familiarity with machine learning model deployment and management practices is a plus.
Strong communication skills, capable of collaborating effectively across technical and non-technical teams.
EDUCATION:
Bachelor's Degree in computer science, MIS, or other business discipline.
Master's Degree in computer science, MIS, or other business discipline preferred.
AWS Certified Solution Architect Preferred.
Databricks Certified Associate Developer for Apache Spark Preferred.
Azure Data Engineer Associate Preferred.
#J-18808-Ljbffr