topic Re: How to Create Iceberg Tables in Databricks Using Parquet Files from S3? in Data Engineering

How to Create Iceberg Tables in Databricks Using Parquet Files from S3?

messiah — Tue, 04 Feb 2025 06:51:13 GMT

Hi Databricks Community,

I’m trying to create Apache Iceberg tables in Databricks using Parquet files stored in an S3 bucket. I found a guide from Dremio, but I’m unable to create Iceberg tables using that method.

Here’s what I need:

Read Parquet files from S3.
Write them as Iceberg tables in Databricks.

Questions:

What cluster configurations (Spark configs, dependencies, etc.) are needed for Iceberg support?
Is there a native way to use Iceberg in Databricks, or do I need to upload JAR files?

Any step-by-step guidance or sample code would be helpful!

Thanks in advance!

Re: How to Create Iceberg Tables in Databricks Using Parquet Files from S3?

Ayushi_Suthar — Tue, 04 Feb 2025 07:40:47 GMT

Hi @messiah , Good Day!

Please follow the below steps to create an iceberg table in Databricks

1 . you have to create the iceberg table using the supported filed format which is stored in the storage location: it's defined here

https://iceberg.apache.org/spec/#:~:text=Version%201%20of%20the%20Iceberg,Parquet%2C%20Avro%2C%20and%20ORC

it basically supported Parquet, Avro, and ORC.

2 . before creating you need to install the iceberg jar instead of the python library file on your cluster according to your cluster spark version: you can download it from here: https://iceberg.apache.org/releases/#downloads

You can follow the below document to install the downloaded Jar file on a cluster : https://docs.databricks.com/en/libraries/cluster-libraries.html#install-a-library-on-a-cluster

Please find an example below :

For other details, you can check this document:

https://docs.databricks.com/en/external-access/iceberg.html

Please let me know if this helps and leave a like if this information is useful, followups are appreciated.

Kudos

Ayushi

Re: How to Create Iceberg Tables in Databricks Using Parquet Files from S3?

messiah — Tue, 04 Feb 2025 08:01:20 GMT

Hi Ayushi,

and this is what happens when I use the fully qualified class name.

and this is my library.

What could be the issue here?

Thanks

Re: How to Create Iceberg Tables in Databricks Using Parquet Files from S3?

Manabian — Tue, 25 Feb 2025 06:32:33 GMT

To use Apache Iceberg via the Hadoop Catalog on Databricks, it was found to work with the following settings:

- Use a Databricks Runtime version of 12.2LTS or earlier.
- Set the access mode to "No isolation shared" (the mode where Unity Catalog cannot be used).
- Use a library compatible with Java 8 (i.e., an Iceberg library earlier than version 1.6.1).
- Apply the necessary Iceberg-related settings in the Spark configuration.

There is also an article (in Japanese) that explains how to resolve the errors:

- https://qiita.com/manabian/items/4c2c78c7db77f704e5ab

Re: How to Create Iceberg Tables in Databricks Using Parquet Files from S3?

Raashid_Khan — Wed, 05 Mar 2025 15:31:00 GMT

How to create/insert in databricks tables for iceberg format? I have iceberg parquets in gcs and want to store them as iceberg tables in databricks catalogs.

Re: How to Create Iceberg Tables in Databricks Using Parquet Files from S3?

Manabian — Tue, 11 Mar 2025 06:26:32 GMT

Unity Catalog does not support Iceberg tables in Databricks. One workaround is to create the Iceberg tables using a deep clone operation. However, please note that these methods do not support features such as Merge-on-Read (MoR) or partition evolution.

ref: Incrementally clone Parquet and Iceberg tables to Delta Lake | Databricks Documentation

Unfortunately, Unity Catalog does not support shallow clone too.

ref: Incrementally clone Parquet and Iceberg tables to Delta Lake | Databricks Documentation

Additionally, there is a Japanese guide that explains how to perform a deep clone on Azure Storage, which may offer useful insights:

Results of cloning an Apache Iceberg table on Databricks #iceberg - Qiita