cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Databricks + Apache Iceberg = advantageous or wasted effort due to duplicate functionality ?

ag2all
New Contributor

Trying to design a Lakehouse. Spark is at the base layer. Now wondering if adding Apache Iceberg sitting below Spark will be of help, or, not ? Preferring Iceberg for its auto indexing, ACID query facilities over big hetergenous datasets. Wonder if its a wise choice??  

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @ag2all, Integrating Apache Iceberg into your Lakehouse architecture alongside Spark can be a smart move, especially given your focus on auto indexing and ACID transaction support. Iceberg's robust metadata and indexing capabilities can significantly enhance query performance, making it easier to manage large, heterogeneous datasets. Its support for full ACID transactions ensures data consistency, which is crucial for reliable data operations. Additionally, Iceberg's features, like schema evolution and time travel, add flexibility and convenience, allowing you to query historical data and adapt to changing data requirements without expensive rewrites. Plus, its compatibility with various query engines like Spark, Trino, and Flink offers flexibility in data processing. Overall, adding Iceberg seems like a wise choice for enhancing your data management and query optimization capabilities in the Lakehouse setup.

holly
Valued Contributor III
Valued Contributor III

Hello, if you're planning on building your own open source stack of spark+iceberg, it can be a good choice. 

If you're on Databricks, however, you're going to miss out a *lot* on delta features that are baked into the platform. Specifically compute + storage performance based optimisations and UC integrations. Delta has ACID compliance, works beautifully with large datasets and you have many performance choices with liquid clustering or legacy z ordering. 

If you're integrating with other systems that are only iceberg compatible, check out uniform to write out additional metadata so other systems can read from it: https://docs.databricks.com/en/delta/uniform.html

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group