Smart IT Frame LLC
We are looking for an
experienced Apache Iceberg Engineer
to design, develop, and optimize large-scale
data lakehouse solutions
leveraging Apache Iceberg. The ideal candidate will have expertise in
big data processing frameworks (Apache Spark, Flink, Presto, Trino, Hive)
and cloud-based data platforms like
AWS S3, Google Cloud Storage, or Azure Data Lake Storage . You will work closely with data engineers, data scientists, and DevOps teams to ensure efficient, scalable, and reliable data architecture. Key Responsibilities: Design, implement, and optimize
Iceberg-based data lake architectures
for large-scale datasets. Develop
data ingestion, transformation, and query optimization pipelines
using
Spark, Flink, or Presto/Trino . Ensure
ACID compliance, schema evolution, and partition evolution
in Iceberg tables. Implement
time travel, versioning, and snapshot management
for historical data analysis. Optimize metadata management and query performance in
Iceberg-based data lakes . Integrate Apache Iceberg with cloud storage solutions ( AWS S3, GCS, ADLS ) and data warehouses. Implement
best practices for data governance, access control, and security
within an Iceberg-based environment. Troubleshoot performance issues, metadata inefficiencies, and schema inconsistencies in Iceberg tables. Collaborate with
DevOps, ML engineers, and BI teams
to enable smooth data workflows. Required Qualifications: Bachelor's or Master's degree in
Computer Science, Data Engineering, or a related field . 3+ years of experience in Big Data, Data Engineering, or Cloud Data Warehousing . Hands-on experience with
Apache Iceberg
in a production environment. Strong expertise in
Apache Spark, Flink, Trino, Presto, or Hive
for big data processing. Proficiency in
SQL and distributed query engines . Experience working with
cloud storage solutions (AWS S3, GCS, ADLS) . Knowledge of
data lakehouse architectures and modern data management principles . Familiarity with
schema evolution, ACID transactions, and partitioning techniques . Experience with
Python, Scala, or Java
for data processing. Preferred Qualifications: Experience in
real-time data processing
using Flink or Kafka. Understanding of
data governance, access control, and compliance frameworks . Knowledge of other data lake frameworks like
Delta Lake (Databricks) or Apache Hudi . Hands-on experience with
Terraform, Kubernetes, or Airflow
for data pipeline automation. Seniority level
Mid-Senior level Employment type
Full-time Job function
Other Industries
Software Development and IT Services and IT Consulting
#J-18808-Ljbffr
experienced Apache Iceberg Engineer
to design, develop, and optimize large-scale
data lakehouse solutions
leveraging Apache Iceberg. The ideal candidate will have expertise in
big data processing frameworks (Apache Spark, Flink, Presto, Trino, Hive)
and cloud-based data platforms like
AWS S3, Google Cloud Storage, or Azure Data Lake Storage . You will work closely with data engineers, data scientists, and DevOps teams to ensure efficient, scalable, and reliable data architecture. Key Responsibilities: Design, implement, and optimize
Iceberg-based data lake architectures
for large-scale datasets. Develop
data ingestion, transformation, and query optimization pipelines
using
Spark, Flink, or Presto/Trino . Ensure
ACID compliance, schema evolution, and partition evolution
in Iceberg tables. Implement
time travel, versioning, and snapshot management
for historical data analysis. Optimize metadata management and query performance in
Iceberg-based data lakes . Integrate Apache Iceberg with cloud storage solutions ( AWS S3, GCS, ADLS ) and data warehouses. Implement
best practices for data governance, access control, and security
within an Iceberg-based environment. Troubleshoot performance issues, metadata inefficiencies, and schema inconsistencies in Iceberg tables. Collaborate with
DevOps, ML engineers, and BI teams
to enable smooth data workflows. Required Qualifications: Bachelor's or Master's degree in
Computer Science, Data Engineering, or a related field . 3+ years of experience in Big Data, Data Engineering, or Cloud Data Warehousing . Hands-on experience with
Apache Iceberg
in a production environment. Strong expertise in
Apache Spark, Flink, Trino, Presto, or Hive
for big data processing. Proficiency in
SQL and distributed query engines . Experience working with
cloud storage solutions (AWS S3, GCS, ADLS) . Knowledge of
data lakehouse architectures and modern data management principles . Familiarity with
schema evolution, ACID transactions, and partitioning techniques . Experience with
Python, Scala, or Java
for data processing. Preferred Qualifications: Experience in
real-time data processing
using Flink or Kafka. Understanding of
data governance, access control, and compliance frameworks . Knowledge of other data lake frameworks like
Delta Lake (Databricks) or Apache Hudi . Hands-on experience with
Terraform, Kubernetes, or Airflow
for data pipeline automation. Seniority level
Mid-Senior level Employment type
Full-time Job function
Other Industries
Software Development and IT Services and IT Consulting
#J-18808-Ljbffr