CoreWeave, Inc
Senior Data Engineer - Dimensional Modeling and Metric Generation
CoreWeave, Inc, Sunnyvale, California, United States, 94087
About the Team
The Data Engineering team builds foundational datasets and analytics services that enable BI and data science across CoreWeave. We seek to democratize insights and foster a culture where data-driven decision-making thrives at every level.
About the Role
We’re seeking a skilled Senior Data Engineer to lead the development of foundational data models that empower our Business Intelligence Engineers, analysts, and data scientists to efficiently work with and gain insights from our data. This role will own the creation and maintenance of star and snowflake schemas within our lakehouse environment and set the standards for dimensional modeling best practices. The engineer will also create and optimize key datasets and metrics essential to tracking business health.
Responsibilities
Develop and maintain data models, including star and snowflake schemas, to support analytical needs across the organization.
Establish and enforce best practices for dimensional modeling in our Lakehouse.
Engineer and optimize data storage using analytical table/file formats (e.g., Iceberg, Parquet, Avro, ORC).
Partner with BI, analytics, and data science teams to design datasets that accurately reflect business metrics.
Tune and optimize data in MPP databases such as StarRocks, Snowflake, BigQuery, or Redshift.
Collaborate on data workflows using Airflow, building and managing pipelines that power our analytical infrastructure.
Ensure efficient processing of large datasets through distributed computing frameworks like Spark or Flink.
Qualifications
You thrive in a fast-paced, complex work environment and love tackling hard problems.
Hands-on experience
applying Kimball modeling principles to large datasets.
Expertise in working with analytical table/file formats, including Iceberg, Parquet, Avro, and ORC.
Proven experience optimizing MPP databases (StarRocks, Snowflake, BigQuery, Redshift).
5+ years
of programming experience in Python or Scala.
Advanced SQL skills, with a strong ability to write, optimize, and debug complex queries.
Hands-on experience with Airflow for batch orchestration and distributed computing frameworks like Spark or Flink.
#J-18808-Ljbffr
The Data Engineering team builds foundational datasets and analytics services that enable BI and data science across CoreWeave. We seek to democratize insights and foster a culture where data-driven decision-making thrives at every level.
About the Role
We’re seeking a skilled Senior Data Engineer to lead the development of foundational data models that empower our Business Intelligence Engineers, analysts, and data scientists to efficiently work with and gain insights from our data. This role will own the creation and maintenance of star and snowflake schemas within our lakehouse environment and set the standards for dimensional modeling best practices. The engineer will also create and optimize key datasets and metrics essential to tracking business health.
Responsibilities
Develop and maintain data models, including star and snowflake schemas, to support analytical needs across the organization.
Establish and enforce best practices for dimensional modeling in our Lakehouse.
Engineer and optimize data storage using analytical table/file formats (e.g., Iceberg, Parquet, Avro, ORC).
Partner with BI, analytics, and data science teams to design datasets that accurately reflect business metrics.
Tune and optimize data in MPP databases such as StarRocks, Snowflake, BigQuery, or Redshift.
Collaborate on data workflows using Airflow, building and managing pipelines that power our analytical infrastructure.
Ensure efficient processing of large datasets through distributed computing frameworks like Spark or Flink.
Qualifications
You thrive in a fast-paced, complex work environment and love tackling hard problems.
Hands-on experience
applying Kimball modeling principles to large datasets.
Expertise in working with analytical table/file formats, including Iceberg, Parquet, Avro, and ORC.
Proven experience optimizing MPP databases (StarRocks, Snowflake, BigQuery, Redshift).
5+ years
of programming experience in Python or Scala.
Advanced SQL skills, with a strong ability to write, optimize, and debug complex queries.
Hands-on experience with Airflow for batch orchestration and distributed computing frameworks like Spark or Flink.
#J-18808-Ljbffr