Dice

Senior Data Engineer (must be located in San Francisco, CA)

Dice, San Francisco, California, United States, 94199

Senior Data Engineer, Data Feeds

About the Role: We are seeking a Sr Data Engineer for our Data feeds team to provide batch data processing, real-time streaming, and pipeline orchestration capabilities. You will be part of the Data Technology organization that helps drive business decisions using data. You will have the opportunity to use your expertise in solving big data problems, design thinking, coding, and analytical skills to build data pipelines and data products and leverage our PB scale data. Our business is data-driven, and you will build solutions to help the company in the areas of marketing, pricing, credit, funding, investing, and many other business aspects, which are transforming the banking industry. We are looking for talented Data Engineers passionate about building new data-driven solutions with the latest Big Data technology.

What you'll Do:

Create and maintain optimal data pipeline architecture.

Build data pipelines that transform raw, unstructured data into formats that data analysts can use for analysis.

Assemble large, complex data sets that meet functional/non-functional business requirements.

Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.

Build the infrastructure required for optimal extraction, transformation, and delivery of data from a wide variety of data sources using SQL and AWS Big Data technologies.

Work with stakeholders including the Executive, Product, Engineering, and program teams to assist with data-related technical issues and support their data infrastructure needs.

Develop and maintain scalable data pipelines and build out new integrations and processes required for optimal extraction, transformation, and loading of data from a wide variety of data sources using scalable distributed Data technologies.

Implement processes and systems to validate data, monitor data quality, ensuring production data is always accurate and available for key stakeholders and business processes that depend on it.

Write unit/integration tests, adopt Test-driven development, contribute to engineering wiki, and document work.

Perform root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.

About you: 6+ years experience and bachelor’s degree in computer science, Informatics, Information Systems or a related field; or equivalent work experience. In-depth working experience of distributed systems Hadoop/MapReduce, Spark, Hive, Kafka and Oozie/Airflow. At least 5 years of solid production quality coding experience in data pipeline implementation in Java, Scala, and Python. Experience with AWS cloud services: EC2, EMR, RDS. Experience in GIT, JIRA, Jenkins, Shell scripting. Familiar with Agile methodology, test-driven development, source control management, and test automation. Experience supporting and working with cross-functional teams in a dynamic environment. You are passionate about data and building efficient data pipelines. You have excellent listening skills and are empathetic to others. You believe in simple and elegant solutions and give paramount importance to quality. You have a track record of building fast, reliable, and high-quality data pipelines.

Nice to have skills: Experience building Marketing Data pipelines including Direct Mail will be a big plus. Experience with Snowflake and Salesforce Marketing Cloud. Working knowledge of open-source ML frameworks and end-to-end model development life cycle. Previous working experience with running containers (Docker/LXC) in a production environment using one of the container orchestration services (Kubernetes, AWS ECS, AWS EKS).

#J-18808-Ljbffr