Sparibis

Senior Data Engineer

Sparibis, Washington, District of Columbia, us, 20022

Location:

100% Remote

Years' Experience:

10+ years

Education:

Bachelor's in IT related field

Work Authorization:

Must show that applicant is legally permitted to work in the United States.

Clearance:

Applicants must be able to meet the requirements to obtain an Public Trust security clearance. NOTE: United States Citizenship is required to be eligible to obtain this security clearance.

Key Skills:10+ years of IT experience focusing on enterprise data architecture and managementExperience with Databricks required8+ years experience in Conceptual/Logical/Physical Data Modeling & expertise in Relational and Dimensional Data ModelingExperience with Great Expectations or other data quality validation frameworksExperience with ETL and ELT tools such as SSIS, Pentaho, and/or Data Migration ServicesAdvanced level SQL experience (Joins, Aggregation, Windowing functions, Common Table Expressions, RDBMS schema design, Postgres performance optimization)Experience with AWS environment, CI/CD pipelines, and Python (Python 3)Responsibilities

Plan, create, and maintain data architectures, ensuring alignment with business requirementsObtain data, formulate dataset processes, and store optimized dataIdentify problems and inefficiencies and apply solutionsDetermine tasks where manual participation can be eliminated with automation.Identify and optimize data bottlenecks, leveraging automation where possibleCreate and manage data lifecycle policies (retention, backups/restore, etc)In-depth knowledge for creating, maintaining, and managing ETL/ELT pipelinesCreate, maintain, and manage data transformationsMaintain/update documentationCreate, maintain, and manage data pipeline schedulesMonitor data pipelinesCreate, maintain, and manage data quality gates (Great Expectations) to ensure high data qualitySupport AI/ML teams with optimizing feature engineering codeExpertise in Spark/Python/Databricks, Data Lake and SQLCreate, maintain, and manage Spark Structured Steaming jobs, including using the newer Delta Live Tables and/or DBTResearch existing data in the data lake to determine best sources for dataCreate, manage, and maintain ksqlDB and Kafka Streams queries/codeData driven testing for data qualityMaintain and update Python-based data processing scripts executed on AWS LambdasUnit tests for all the Spark, Python data processing and Lambda codesMaintain PCIS Reporting Database data lake with optimizations and maintenance (performance tuning, etc)Streamlining data processing experience including formalizing concepts of how to handle lake data, defining windows, and how window definitions impact data freshness.Qualifications

10+ years of IT experience focusing on enterprise data architecture and managementExperience in Conceptual/Logical/Physical Data Modeling & expertise in Relational and Dimensional Data ModelingExperience with Databricks, Structured Streaming, Delta Lake concepts, and Delta Live Tables required

Additional experience with Spark, Spark SQL, Spark DataFrames and DataSets, and PySparkData Lake concepts such as time travel and schema evolution and optimizationStructured Streaming and Delta Live Tables with Databricks a bonus

Experience leading and architecting enterprise-wide initiatives specifically system integration, data migration, transformation, data warehouse build, data mart build, and data lakes implementation / support

Advanced level understanding of streaming data pipelines and how they differ from batch systemsFormalize concepts of how to handle late data, defining windows, and data freshnessAdvanced understanding of ETL and ELT and ETL/ELT tools such as SSIS, Pentaho, Data Migration Service etcUnderstanding of concepts and implementation strategies for different incremental data loads such as tumbling window, sliding window, high watermark, etc.Familiarity and/or expertise with Great Expectations or other data quality/data validation frameworks a bonusUnderstanding of streaming data pipelines and batch systemsFamiliarity with concepts such as late data, defining windows, and how window definitions impact data freshness

Advanced level SQL experience (Joins, Aggregation, Windowing functions, Common Table Expressions, RDBMS schema design, Postgres performance optimization)

Indexing and partitioning strategy experience

Debug, troubleshoot, design and implement solutions to complex technical issuesExperience with large-scale, high-performance enterprise big data application deployment and solutionUnderstanding how to create DAGs to define workflowsFamiliarity with CI/CD pipelines, containerization, and pipeline orchestration tools such as Airflow, Prefect, etc a bonus but not requiredArchitecture experience in AWS environment a bonus

Familiarity working with Kinesis and/or Lambda specifically with how to push and pull data, how to use AWS tools to view data in Kinesis streams, and for processing massive data at scale a bonusExperience with Docker, Jenkins, and CloudWatchAbility to write and maintain Jenkinsfiles for supporting CI/CD pipelinesExperience working with AWS Lambdas for configuration and optimizationExperience working with DynamoDB to query and write dataExperience with S3

Knowledge of Python (Python 3 desired) for CI/CD pipelines a bonus

Familiarity with Pytest and Unittest a bonus

Experience working with JSON and defining JSON Schemas a bonusExperience setting up and management Confluent/Kafka topics and ensuring performance using Kafka a bonus

Familiarity with Schema Registry, message formats such as Avro, ORC, etc.Understanding how to manage ksqlDB SQL files and migrations and Kafka Streams

Ability to thrive in a team-based environmentExperience briefing the benefits and constraints of technology solutions to technology partners, stakeholders, team members, and senior level of management

About Sparibis

Sparibis LLC is a professional solution firm that Clients rely on to access the best talent to drive their business success.

Sparibis is an equal opportunity employer that values diversity at all levels. All individuals, regardless of personal characteristics, are encouraged to apply.