cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to Create Iceberg Tables in Databricks Using Parquet Files from S3?

messiah
New Contributor II

Hi Databricks Community,

Iโ€™m trying to create Apache Iceberg tables in Databricks using Parquet files stored in an S3 bucket. I found a guide from Dremio, but Iโ€™m unable to create Iceberg tables using that method.

Hereโ€™s what I need:

  1. Read Parquet files from S3.
  2. Write them as Iceberg tables in Databricks.

Questions:

  1. What cluster configurations (Spark configs, dependencies, etc.) are needed for Iceberg support?
  2. Is there a native way to use Iceberg in Databricks, or do I need to upload JAR files?

Any step-by-step guidance or sample code would be helpful!

Thanks in advance!

5 REPLIES 5

Ayushi_Suthar
Databricks Employee
Databricks Employee

Hi @messiah , Good Day!

Please follow the below steps to create an iceberg table in Databricks

1 . you have to create the iceberg table using the supported filed format which is stored in the storage location: it's defined here

https://iceberg.apache.org/spec/#:~:text=Version%201%20of%20the%20Iceberg,Parquet%2C%20Avro%2C%20and...

it basically supported  Parquet, Avro, and ORC.

2 . before creating you need to install the iceberg jar instead of the python library file on your cluster according to your cluster spark version: you can download it from here: https://iceberg.apache.org/releases/#downloads

You can follow the below document to install the downloaded Jar file on a cluster : https://docs.databricks.com/en/libraries/cluster-libraries.html#install-a-library-on-a-cluster

Please find an example below : 

Ayushi_Suthar_0-1738654734430.png

 

For other details, you can check this document: 

https://docs.databricks.com/en/external-access/iceberg.html

Please let me know if this helps and leave a like if this information is useful, followups are appreciated.

Kudos

Ayushi

 

 

messiah
New Contributor II

Hi Ayushi,

messiah_2-1738655896647.png

and this is what happens when I use the fully qualified class name.

messiah_4-1738656049717.png

and this is my library.

messiah_3-1738655979903.png

What could be the issue here?

Thanks

Manabian
New Contributor III

To use Apache Iceberg via the Hadoop Catalog on Databricks, it was found to work with the following settings:

- Use a Databricks Runtime version of 12.2LTS or earlier.
- Set the access mode to "No isolation shared" (the mode where Unity Catalog cannot be used).
- Use a library compatible with Java 8 (i.e., an Iceberg library earlier than version 1.6.1).
- Apply the necessary Iceberg-related settings in the Spark configuration.

There is also an article (in Japanese) that explains how to resolve the errors:

- https://qiita.com/manabian/items/4c2c78c7db77f704e5ab

iceberg.png

โ€ƒ

Raashid_Khan
New Contributor II

How to create/insert in databricks tables for iceberg format? I have iceberg parquets in gcs and want to store them as iceberg tables in databricks catalogs.

Unity Catalog does not support Iceberg tables in Databricks. One workaround is to create the Iceberg tables using a deep clone operation. However, please note that these methods do not support features such as Merge-on-Read (MoR) or partition evolution.

Manabian_0-1741673960771.png
ref: Incrementally clone Parquet and Iceberg tables to Delta Lake | Databricks Documentation

Unfortunately, Unity Catalog does not support shallow clone too.

Manabian_1-1741674101155.png
ref: Incrementally clone Parquet and Iceberg tables to Delta Lake | Databricks Documentation

Additionally, there is a Japanese guide that explains how to perform a deep clone on Azure Storage, which may offer useful insights:

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group