Diamondpick
GCP Data Engineer
Diamondpick, Burbank, California, United States, 91520
The GCP Data Engineer will be responsible for constructing and developing large-scale cloud data processing systems within the Google Cloud Platform (GCP). This role involves curating a comprehensive data set that includes information about users, groups, and their permissions to various data sets. The engineer will redesign and implement a scalable data pipeline to ensure timely updates and transparency in data access.
REQUIRED SKILLS: 5+ years of experience in an engineering role using Python, Java, Spark, and SQL. 5+ experience working as a Data Engineer in GCP Demonstrated proficiency with Google's Identity and Access Management (IAM) API Demonstrated proficiency with Airflow Key Responsibilities: • Design, develop, and implement scalable, high-performance data solutions on GCP. • Ensure that changes to data access permissions are reflected in the Tableau dashboard within 24 hours. • Collaborate with technical and business users to share and manage data sets across multiple projects. • Utilize GCP tools and technologies to optimize data processing and storage. • Re-architect the data pipeline that builds the BigQuery dataset used for GCP IAM dashboards to make it more scalable. • Run and customize DLP scans. • Build bidirectional integrations between GCP and Collibra. • Explore and potentially implement Dataplex and custom format-preserving encryption for de-identifying data for developers in lower environments. Qualifications - Required: • Bachelor's degree in Computer Engineering or a related field. • 5+ years of experience in an engineering role using Python, Java, Spark, and SQL. • 5+ years of experience working as a Data Engineer in GCP. • Proficiency with Google's Identity and Access Management (IAM) API. • Strong Linux/Unix background and hands-on knowledge. • Experience with big data technologies such as HDFS, Spark, Impala, and Hive. • Experience with Shell scripting and bash. • Experience with version control platforms like GitHub. • Experience with unit testing code. • Experience with development ecosystems including Jenkins, Artifactory, CI/CD, and Terraform. • Demonstrated proficiency with Airflow. • Ability to advise management on approaches to optimize for data platform success. • Ability to effectively communicate highly technical information to various audiences, including management, the user community, and less-experienced staff. • Proficiency in multiple programming languages, frameworks, domains, and tools. • Coding skills in Scala. • Experience with GCP platform development tools such as Pub/Sub, Cloud Storage, Bigtable, BigQuery, Dataflow, Dataproc, and Composer. • Knowledge in Hadoop and cloud platforms and surrounding ecosystems. • Experience with web services and APIs (RESTful and SOAP). • Ability to document designs and concepts. • API Orchestration and Choreography for consumer apps. • Well-rounded technical expertise in Apache packages and hybrid cloud architectures. • Pipeline creation and automation for data acquisition. • Metadata extraction pipeline design and creation between raw and transformed datasets. • Quality control metrics data collection on data acquisition pipelines. • Experience contributing to and leveraging Jira and Confluence. • Strong experience working with real-time streaming applications and batch-style large-scale distributed computing applications using tools like Spark, Kafka, Flume, Pub/Sub, and Airflow. • Ability to work with different file formats like Avro, Parquet, and JSON.. • Hands-on experience in Analysis, Design, Coding, and Testing phases of the Software Development Life Cycle (SDLC)
REQUIRED SKILLS: 5+ years of experience in an engineering role using Python, Java, Spark, and SQL. 5+ experience working as a Data Engineer in GCP Demonstrated proficiency with Google's Identity and Access Management (IAM) API Demonstrated proficiency with Airflow Key Responsibilities: • Design, develop, and implement scalable, high-performance data solutions on GCP. • Ensure that changes to data access permissions are reflected in the Tableau dashboard within 24 hours. • Collaborate with technical and business users to share and manage data sets across multiple projects. • Utilize GCP tools and technologies to optimize data processing and storage. • Re-architect the data pipeline that builds the BigQuery dataset used for GCP IAM dashboards to make it more scalable. • Run and customize DLP scans. • Build bidirectional integrations between GCP and Collibra. • Explore and potentially implement Dataplex and custom format-preserving encryption for de-identifying data for developers in lower environments. Qualifications - Required: • Bachelor's degree in Computer Engineering or a related field. • 5+ years of experience in an engineering role using Python, Java, Spark, and SQL. • 5+ years of experience working as a Data Engineer in GCP. • Proficiency with Google's Identity and Access Management (IAM) API. • Strong Linux/Unix background and hands-on knowledge. • Experience with big data technologies such as HDFS, Spark, Impala, and Hive. • Experience with Shell scripting and bash. • Experience with version control platforms like GitHub. • Experience with unit testing code. • Experience with development ecosystems including Jenkins, Artifactory, CI/CD, and Terraform. • Demonstrated proficiency with Airflow. • Ability to advise management on approaches to optimize for data platform success. • Ability to effectively communicate highly technical information to various audiences, including management, the user community, and less-experienced staff. • Proficiency in multiple programming languages, frameworks, domains, and tools. • Coding skills in Scala. • Experience with GCP platform development tools such as Pub/Sub, Cloud Storage, Bigtable, BigQuery, Dataflow, Dataproc, and Composer. • Knowledge in Hadoop and cloud platforms and surrounding ecosystems. • Experience with web services and APIs (RESTful and SOAP). • Ability to document designs and concepts. • API Orchestration and Choreography for consumer apps. • Well-rounded technical expertise in Apache packages and hybrid cloud architectures. • Pipeline creation and automation for data acquisition. • Metadata extraction pipeline design and creation between raw and transformed datasets. • Quality control metrics data collection on data acquisition pipelines. • Experience contributing to and leveraging Jira and Confluence. • Strong experience working with real-time streaming applications and batch-style large-scale distributed computing applications using tools like Spark, Kafka, Flume, Pub/Sub, and Airflow. • Ability to work with different file formats like Avro, Parquet, and JSON.. • Hands-on experience in Analysis, Design, Coding, and Testing phases of the Software Development Life Cycle (SDLC)