Hi @Phani1
Using Apache Iceberg instead of Delta Lake for saving data in Databricks can unlock cross-platform compatibility but comes with several potential challenges,
especially within the Databricks ecosystem which is natively optimized for Delta Lake.
Key Challenges When Using Iceberg Instead of Delta in Databricks
1. Feature Parity and Platform Support
Delta Lake is first-class in Databricks with full support for advanced features like:
- Time travel, schema evolution, Z-ordering
- Change data capture (CDC)
- Unity Catalog lineage, data masking, constraints
Iceberg support is newer and more limited in Databricks:
- Not all Delta-native features are supported yet.
- CDC and Z-order are not currently available for Iceberg.
2. Performance Optimization Limitations
- Delta benefits from Databricks-specific optimizations (e.g., Photon, dynamic file pruning, OPTIMIZE/ZORDER).
- Iceberg tables may see slower query performance due to:
Lack of automatic file compaction
Weaker runtime query optimizations on Databricks
Inability to leverage features like Delta caching
3. Limited Write & Maintenance Commands
- Supports commands like:
OPTIMIZE, VACUUM, MERGE INTO, DELETE
- Iceberg in Databricks only partially supports these, and some may behave differently or
be missing altogether depending on the workspace version and configuration.
4. Unity Catalog Constraints
-As of mid-2025:
1. Iceberg tables must reside in Unity Catalog to be fully supported.
2. There may be limited support for fine-grained governance, lineage, or row-level security vs Delta tables.
5. Tooling and Compatibility
- While Iceberg is designed for multi-engine interoperability (e.g., Trino, Snowflake, Flink), in Databricks pipelines,
many tools (e.g., MLflow, Auto Loader, Streaming) still assume Delta tables.
- Using Iceberg can break expectations in Delta-native workflows, such as:
- Streaming reads/writes
- ML feature store integrations
- Delta Sharing
6. Fewer Validations & API Support
- Spark APIs in Databricks are richer and more stable for Delta.
- Iceberg operations may not support full schema enforcement, constraints, or write audit capabilities.
When to Consider Iceberg
- Use Apache Iceberg only if:
- You need interoperability across multiple engines (e.g., Trino, Presto, Snowflake).
- You have an enterprise data lake architecture that mandates open table formats beyond Spark.
- You're ready to invest in managing performance manually (compaction, snapshot cleanup, etc.).
LR