Databricks + Apache Iceberg = advantageous or wasted effort due to duplicate functionality ?

ag2all — Wed, 31 Jul 2024 18:40:35 GMT

Trying to design a Lakehouse. Spark is at the base layer. Now wondering if adding Apache Iceberg sitting below Spark will be of help, or, not ? Preferring Iceberg for its auto indexing, ACID query facilities over big hetergenous datasets. Wonder if its a wise choice??

Re: Databricks + Apache Iceberg = advantageous or wasted effort due to duplicate functionality ?

holly — Tue, 06 Aug 2024 10:10:06 GMT

Hello, if you're planning on building your own open source stack of spark+iceberg, it can be a good choice.

If you're on Databricks, however, you're going to miss out a *lot* on delta features that are baked into the platform. Specifically compute + storage performance based optimisations and UC integrations. Delta has ACID compliance, works beautifully with large datasets and you have many performance choices with liquid clustering or legacy z ordering.

If you're integrating with other systems that are only iceberg compatible, check out uniform to write out additional metadata so other systems can read from it: https://docs.databricks.com/en/delta/uniform.html

topic Databricks + Apache Iceberg = advantageous or wasted effort due to duplicate functionality ? in Administration & Architecture

Databricks + Apache Iceberg = advantageous or wasted effort due to duplicate functionality ?

Re: Databricks + Apache Iceberg = advantageous or wasted effort due to duplicate functionality ?