Smart IT Frame LLC

Apache Iceberg Engineer

Smart IT Frame LLC, Sunnyvale, California, United States, 94087

We are looking for an

experienced Apache Iceberg Engineer

to design, develop, and optimize large-scale

data lakehouse solutions

leveraging Apache Iceberg. The ideal candidate will have expertise in

big data processing frameworks (Apache Spark, Flink, Presto, Trino, Hive)

and cloud-based data platforms like

AWS S3, Google Cloud Storage, or Azure Data Lake Storage . You will work closely with data engineers, data scientists, and DevOps teams to ensure efficient, scalable, and reliable data architecture. Key Responsibilities: Design, implement, and optimize

Iceberg-based data lake architectures

for large-scale datasets. Develop

data ingestion, transformation, and query optimization pipelines

using

Spark, Flink, or Presto/Trino . Ensure

ACID compliance, schema evolution, and partition evolution

in Iceberg tables. Implement

time travel, versioning, and snapshot management

for historical data analysis. Optimize metadata management and query performance in

Iceberg-based data lakes . Integrate Apache Iceberg with cloud storage solutions ( AWS S3, GCS, ADLS ) and data warehouses. Implement

best practices for data governance, access control, and security

within an Iceberg-based environment. Troubleshoot performance issues, metadata inefficiencies, and schema inconsistencies in Iceberg tables. Collaborate with

DevOps, ML engineers, and BI teams

to enable smooth data workflows. Required Qualifications: Bachelor's or Master's degree in

Computer Science, Data Engineering, or a related field . 3+ years of experience in Big Data, Data Engineering, or Cloud Data Warehousing . Hands-on experience with

Apache Iceberg

in a production environment. Strong expertise in

Apache Spark, Flink, Trino, Presto, or Hive

for big data processing. Proficiency in

SQL and distributed query engines . Experience working with

cloud storage solutions (AWS S3, GCS, ADLS) . Knowledge of

data lakehouse architectures and modern data management principles . Familiarity with

schema evolution, ACID transactions, and partitioning techniques . Experience with

Python, Scala, or Java

for data processing. Preferred Qualifications: Experience in

real-time data processing

using Flink or Kafka. Understanding of

data governance, access control, and compliance frameworks . Knowledge of other data lake frameworks like

Delta Lake (Databricks) or Apache Hudi . Hands-on experience with

Terraform, Kubernetes, or Airflow

for data pipeline automation. Seniority level

Mid-Senior level Employment type

Full-time Job function

Other Industries

Software Development and IT Services and IT Consulting

#J-18808-Ljbffr