cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Potential Challenges of Using Iceberg Format (Databricks + Iceberg)

Phani1
Valued Contributor II

 

Hi Team,

What are the potential challenges of using Iceberg format instead of Delta for saving data in databricks?

Regards,

Phani

2 REPLIES 2

SP_6721
Contributor

Hi @Phani1 

From my understanding, Delta Lake tends to perform better when it comes to frequent data modifications and operations. It also integrates more seamlessly with various Databricks features and tools compared to Iceberg.

lingareddy_Alva
Honored Contributor II

Hi @Phani1 

Using Apache Iceberg instead of Delta Lake for saving data in Databricks can unlock cross-platform compatibility but comes with several potential challenges,
especially within the Databricks ecosystem which is natively optimized for Delta Lake.

Key Challenges When Using Iceberg Instead of Delta in Databricks
1. Feature Parity and Platform Support
Delta Lake is first-class in Databricks with full support for advanced features like:
- Time travel, schema evolution, Z-ordering
- Change data capture (CDC)
- Unity Catalog lineage, data masking, constraints
Iceberg support is newer and more limited in Databricks:
- Not all Delta-native features are supported yet.
- CDC and Z-order are not currently available for Iceberg.

2. Performance Optimization Limitations
- Delta benefits from Databricks-specific optimizations (e.g., Photon, dynamic file pruning, OPTIMIZE/ZORDER).
- Iceberg tables may see slower query performance due to:
Lack of automatic file compaction
Weaker runtime query optimizations on Databricks
Inability to leverage features like Delta caching

3. Limited Write & Maintenance Commands
- Supports commands like:
OPTIMIZE, VACUUM, MERGE INTO, DELETE
- Iceberg in Databricks only partially supports these, and some may behave differently or
be missing altogether depending on the workspace version and configuration.

4. Unity Catalog Constraints
-As of mid-2025:
1. Iceberg tables must reside in Unity Catalog to be fully supported.
2. There may be limited support for fine-grained governance, lineage, or row-level security vs Delta tables.

5. Tooling and Compatibility
- While Iceberg is designed for multi-engine interoperability (e.g., Trino, Snowflake, Flink), in Databricks pipelines,
many tools (e.g., MLflow, Auto Loader, Streaming) still assume Delta tables.
- Using Iceberg can break expectations in Delta-native workflows, such as:
- Streaming reads/writes
- ML feature store integrations
- Delta Sharing
6. Fewer Validations & API Support
- Spark APIs in Databricks are richer and more stable for Delta.
- Iceberg operations may not support full schema enforcement, constraints, or write audit capabilities.

When to Consider Iceberg
- Use Apache Iceberg only if:
- You need interoperability across multiple engines (e.g., Trino, Presto, Snowflake).
- You have an enterprise data lake architecture that mandates open table formats beyond Spark.
- You're ready to invest in managing performance manually (compaction, snapshot cleanup, etc.).

 

LR

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now