Databricks + Apache Iceberg = advantageous or wasted effort due to duplicate functionality ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-31-2024 11:40 AM
Trying to design a Lakehouse. Spark is at the base layer. Now wondering if adding Apache Iceberg sitting below Spark will be of help, or, not ? Preferring Iceberg for its auto indexing, ACID query facilities over big hetergenous datasets. Wonder if its a wise choice??
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-06-2024 03:10 AM
Hello, if you're planning on building your own open source stack of spark+iceberg, it can be a good choice.
If you're on Databricks, however, you're going to miss out a *lot* on delta features that are baked into the platform. Specifically compute + storage performance based optimisations and UC integrations. Delta has ACID compliance, works beautifully with large datasets and you have many performance choices with liquid clustering or legacy z ordering.
If you're integrating with other systems that are only iceberg compatible, check out uniform to write out additional metadata so other systems can read from it: https://docs.databricks.com/en/delta/uniform.html

