JobRialto
Data Engineer
JobRialto, San Jose, California, United States, 95199
Job Summary:
We are looking for a highly skilled Lead Data Engineer with extensive experience in designing and developing high-performance data integration pipelines, following a CI/CD model. The ideal candidate will lead a team of data engineers, drive technical solutions, and work on a variety of cutting-edge technologies, such as GCP, Python, HANA, Kafka, and Pyspark. You will be responsible for building robust and scalable data pipelines, ensuring data quality, and guiding the team through complex technical challenges. This role demands a hands-on leader who can balance platform stability with feature delivery while working in a fast-paced environment.
Key Responsibilities: Data Pipeline Development: Design and develop high-performance data pipeline frameworks from scratch, covering: Data ingestion across systems Data quality and curation Data transformation and efficient storage Data reconciliation, monitoring, and controls Support reporting models and downstream application needs Technical Leadership: Lead and manage a team of data engineers. Participate in code reviews and provide guidance on best practices for building data pipelines. Hands-on Execution: Perform Proofs of Concept (POCs) on open-source or licensed tools and provide recommendations for new tools and technologies. Cross-platform Development: Contribute to the definition, development, integration, testing, documentation, and support of software across multiple platforms such as GCP, Python, and HANA. Project Management Framework: Establish and maintain a consistent project management framework to deliver high-quality software in rapid iterations, collaborating with business partners across different geographies. Problem-Solving and Troubleshooting: Participate in a team that designs, develops, troubleshoots, and debugs software programs, focusing on databases, applications, and tools. Balance Priorities: Ensure stability of the production platform while managing feature delivery and reducing technical debt across various technologies. Technologies and Tools:
GCP Cloud Platform: Experience with GCS, Big Query, Streaming (Pub/Sub), DataProc, DataFlow, NIFI. Programming & Scripting: Strong proficiency in Python, PySpark, SQL, Shell scripting, and Stored Procedures. Data Management: Hands-on experience with data warehousing, distributed data platforms, and data lakes. Data Modeling & Visualization: Experience with Looker Views and Models, as well as database schema design. CI/CD Pipelines: Familiarity with the CI/CD process to automate data pipeline deployments and testing. Required Qualifications:
Extensive experience in designing and building data integration pipelines in a CI/CD environment. Proven track record in scripting and coding with Python, PySpark, and SQL. Experience with cloud platforms, specifically Google Cloud Platform (GCP), including tools like BigQuery, DataProc, and Pub/Sub. Experience in data warehouse management, data lake architectures, and data modeling. Familiarity with managing and deploying data pipelines in a distributed systems environment. Strong problem-solving and troubleshooting skills, with the ability to break down complex, multi-dimensional challenges. Proven ability to lead a team of engineers, mentor junior team members, and perform code reviews. Excellent communication skills, with the ability to collaborate across functional teams and manage multiple stakeholders. Preferred Qualifications:
Hands-on experience with Kafka for real-time data streaming. Experience with NIFI for data flow orchestration. Familiarity with HANA for database management and analytics. Experience in managing data pipelines in multi-cloud environments.
Certifications:
Relevant cloud certifications (e.g., Google Cloud Professional Data Engineer).
Data engineering or big data certifications.
Education:
Bachelors Degree
We are looking for a highly skilled Lead Data Engineer with extensive experience in designing and developing high-performance data integration pipelines, following a CI/CD model. The ideal candidate will lead a team of data engineers, drive technical solutions, and work on a variety of cutting-edge technologies, such as GCP, Python, HANA, Kafka, and Pyspark. You will be responsible for building robust and scalable data pipelines, ensuring data quality, and guiding the team through complex technical challenges. This role demands a hands-on leader who can balance platform stability with feature delivery while working in a fast-paced environment.
Key Responsibilities: Data Pipeline Development: Design and develop high-performance data pipeline frameworks from scratch, covering: Data ingestion across systems Data quality and curation Data transformation and efficient storage Data reconciliation, monitoring, and controls Support reporting models and downstream application needs Technical Leadership: Lead and manage a team of data engineers. Participate in code reviews and provide guidance on best practices for building data pipelines. Hands-on Execution: Perform Proofs of Concept (POCs) on open-source or licensed tools and provide recommendations for new tools and technologies. Cross-platform Development: Contribute to the definition, development, integration, testing, documentation, and support of software across multiple platforms such as GCP, Python, and HANA. Project Management Framework: Establish and maintain a consistent project management framework to deliver high-quality software in rapid iterations, collaborating with business partners across different geographies. Problem-Solving and Troubleshooting: Participate in a team that designs, develops, troubleshoots, and debugs software programs, focusing on databases, applications, and tools. Balance Priorities: Ensure stability of the production platform while managing feature delivery and reducing technical debt across various technologies. Technologies and Tools:
GCP Cloud Platform: Experience with GCS, Big Query, Streaming (Pub/Sub), DataProc, DataFlow, NIFI. Programming & Scripting: Strong proficiency in Python, PySpark, SQL, Shell scripting, and Stored Procedures. Data Management: Hands-on experience with data warehousing, distributed data platforms, and data lakes. Data Modeling & Visualization: Experience with Looker Views and Models, as well as database schema design. CI/CD Pipelines: Familiarity with the CI/CD process to automate data pipeline deployments and testing. Required Qualifications:
Extensive experience in designing and building data integration pipelines in a CI/CD environment. Proven track record in scripting and coding with Python, PySpark, and SQL. Experience with cloud platforms, specifically Google Cloud Platform (GCP), including tools like BigQuery, DataProc, and Pub/Sub. Experience in data warehouse management, data lake architectures, and data modeling. Familiarity with managing and deploying data pipelines in a distributed systems environment. Strong problem-solving and troubleshooting skills, with the ability to break down complex, multi-dimensional challenges. Proven ability to lead a team of engineers, mentor junior team members, and perform code reviews. Excellent communication skills, with the ability to collaborate across functional teams and manage multiple stakeholders. Preferred Qualifications:
Hands-on experience with Kafka for real-time data streaming. Experience with NIFI for data flow orchestration. Familiarity with HANA for database management and analytics. Experience in managing data pipelines in multi-cloud environments.
Certifications:
Relevant cloud certifications (e.g., Google Cloud Professional Data Engineer).
Data engineering or big data certifications.
Education:
Bachelors Degree