JobRialto
GCP Data Engineer
JobRialto, Dallas, Texas, 75215
Job Summary: We are seeking an experienced GCP Data Engineer to design, develop, and implement large-scale data processing systems on the Google Cloud Platform (GCP). The engineer will be responsible for curating a comprehensive data set that includes information about users, groups, and their access permissions to various data sets. This role also involves redesigning and optimizing the existing data pipeline for scalability and efficiency, ensuring timely updates and transparency in data access across multiple platforms. Key Responsibilities: Data Pipeline Development: Design, develop, and implement scalable, high-performance data solutions on GCP to process large datasets efficiently. Data Access Management: Curate and manage a comprehensive data set detailing user permissions and group memberships. Pipeline Optimization: Redesign the existing data pipeline to improve scalability, reduce processing time, and ensure efficient data flow. Real-time Data Updates: Ensure that any changes to data access permissions are reflected in the Tableau dashboard within 24 hours. Collaboration & Data Sharing: Collaborate with both technical teams and business users to share and manage data sets across multiple projects and environments. GCP Tools & Technologies: Utilize GCP services and tools such as Pub/Sub, BigQuery, Dataflow, Dataproc, and Composer to optimize data processing and storage. Re-architecture & Integration: Re-architect the data pipeline that builds the BigQuery dataset for GCP IAM dashboards to enhance scalability. Build bidirectional integrations between GCP and Collibra. Data Loss Prevention (DLP): Run and customize DLP scans to ensure sensitive data is properly handled. Data Encryption & De-identification: Explore and implement Dataplex and custom format-preserving encryption for de-identifying data in lower environments. Documentation & Process Improvement: Document designs, methodologies, and best practices. Recommend improvements to management regarding new processes, tools, and techniques to optimize platform performance. Collaboration & Support: Work closely with Scrum teams, including Scrum Masters, Product Owners, Data Analysts, QA teams, and Data Architects to deliver high-quality data solutions. Required Qualifications: Education: Bachelor's degree in Computer Engineering, Data Engineering, or a related field. Experience: 5 years of experience in an engineering role with hands-on experience in Python, Java, Spark, and SQL. 5 years of experience working as a Data Engineer within GCP. Technical Skills: Proficiency with GCP services such as Pub/Sub, Cloud Storage, Bigtable, BigQuery, Dataflow, Dataproc, and Composer. Strong knowledge of Google's Identity and Access Management (IAM) API. Experience with big data technologies like HDFS, Spark, Impala, and Hive. Proficiency in Shell scripting, Bash, and version control using GitHub. Strong experience with Airflow for orchestration. Familiarity with Hadoop ecosystems and cloud platforms. Proficiency with real-time streaming applications and large-scale batch processing using tools like Spark, Kafka, Flume, Pub/Sub, and Airflow. Experience with working with various file formats such as Avro, Parquet, and JSON. Experience with web services and APIs (RESTful and SOAP). Problem Solving & Analysis: Ability to analyze complex data sets and exercise independent judgment on moderately complex issues. Proficiency in creating and optimizing data pipelines and automation for data acquisition. Ability to collaborate with cross-functional teams to improve data workflows and implement best practices. Preferred Qualifications: Programming Languages: Proficiency in additional programming languages such as Scala. Cloud Platform Expertise: Experience with hybrid cloud architectures and the GCP platform's surrounding ecosystem. Data Management: Knowledge of metadata extraction pipelines and quality control metrics for data acquisition pipelines. Scrum & Agile: Experience with Agile methodologies and tools such as Jira and Confluence for project management. Data Security & Encryption: Experience with format-preserving encryption techniques and data loss prevention (DLP) scanning. Big Data Technologies: Hands-on experience with Dataplex for data lake management. Cloud Integration: Experience with building integrations between GCP and other platforms, such as Collibra. Certifications (if any): Google Cloud Professional Data Engineer certification is a plus. Certifications in big data technologies like Spark, Hadoop, or related fields are highly preferred. Education: Bachelors Degree