Databricks Community

Phani1 · ‎05-16-2025

Hi Team,

What are the potential challenges of using Iceberg format instead of Delta for saving data in databricks?

Regards,

Phani

SP_6721 · ‎05-16-2025

From my understanding, Delta Lake tends to perform better when it comes to frequent data modifications and operations. It also integrates more seamlessly with various Databricks features and tools compared to Iceberg.

lingareddy_Alva · ‎05-16-2025

Hi @Phani1

Using Apache Iceberg instead of Delta Lake for saving data in Databricks can unlock cross-platform compatibility but comes with several potential challenges,
especially within the Databricks ecosystem which is natively optimized for Delta Lake.

Key Challenges When Using Iceberg Instead of Delta in Databricks
1. Feature Parity and Platform Support
Delta Lake is first-class in Databricks with full support for advanced features like:
- Time travel, schema evolution, Z-ordering
- Change data capture (CDC)
- Unity Catalog lineage, data masking, constraints
Iceberg support is newer and more limited in Databricks:
- Not all Delta-native features are supported yet.
- CDC and Z-order are not currently available for Iceberg.

2. Performance Optimization Limitations
- Delta benefits from Databricks-specific optimizations (e.g., Photon, dynamic file pruning, OPTIMIZE/ZORDER).
- Iceberg tables may see slower query performance due to:
Lack of automatic file compaction
Weaker runtime query optimizations on Databricks
Inability to leverage features like Delta caching

3. Limited Write & Maintenance Commands
- Supports commands like:
OPTIMIZE, VACUUM, MERGE INTO, DELETE
- Iceberg in Databricks only partially supports these, and some may behave differently or
be missing altogether depending on the workspace version and configuration.

4. Unity Catalog Constraints
-As of mid-2025:
1. Iceberg tables must reside in Unity Catalog to be fully supported.
2. There may be limited support for fine-grained governance, lineage, or row-level security vs Delta tables.

5. Tooling and Compatibility
- While Iceberg is designed for multi-engine interoperability (e.g., Trino, Snowflake, Flink), in Databricks pipelines,
many tools (e.g., MLflow, Auto Loader, Streaming) still assume Delta tables.
- Using Iceberg can break expectations in Delta-native workflows, such as:
- Streaming reads/writes
- ML feature store integrations
- Delta Sharing
6. Fewer Validations & API Support
- Spark APIs in Databricks are richer and more stable for Delta.
- Iceberg operations may not support full schema enforcement, constraints, or write audit capabilities.

When to Consider Iceberg
- Use Apache Iceberg only if:
- You need interoperability across multiple engines (e.g., Trino, Presto, Snowflake).
- You have an enterprise data lake architecture that mandates open table formats beyond Spark.
- You're ready to invest in managing performance manually (compaction, snapshot cleanup, etc.).

LR

andreapeterson · ‎07-07-2025

Late to the party but also noteworthy to mention you cannot apply masking policies on iceberg tables, atleast in what I found testing

sridharplv · ‎07-07-2025

Hi @Phani1 , Please find the below link which details out maintaining icerberg metadata along with delta metadata.

https://community.databricks.com/t5/technical-blog/read-delta-tables-with-snowflake-via-unity-catalo...

Databricks Community

Potential Challenges of Using Iceberg Format (Databricks + Iceberg)

Join Us as a Local Community Builder!

Solution Accelerator Series | #5 - Automating Product Review Summarization with LLMs

The next BrickTalks about the latest and greatest in AI/BI is scheduled for Oct 28!

🚀 Weekly Delta (8 - 14 October): A Look Back at This Week’s Top Community Highlights

BrickCon 2025 — Dec 3–5 | A Community Conference for Databricks Builders

🌟 Community Sparks of the Week | September 26 – October 2 🌟