02-03-2025 10:51 PM
Hi Databricks Community,
I’m trying to create Apache Iceberg tables in Databricks using Parquet files stored in an S3 bucket. I found a guide from Dremio, but I’m unable to create Iceberg tables using that method.
Here’s what I need:
Questions:
Any step-by-step guidance or sample code would be helpful!
Thanks in advance!
02-03-2025 11:40 PM
Hi @messiah , Good Day!
Please follow the below steps to create an iceberg table in Databricks
1 . you have to create the iceberg table using the supported filed format which is stored in the storage location: it's defined here
it basically supported Parquet, Avro, and ORC.
2 . before creating you need to install the iceberg jar instead of the python library file on your cluster according to your cluster spark version: you can download it from here: https://iceberg.apache.org/releases/#downloads
You can follow the below document to install the downloaded Jar file on a cluster : https://docs.databricks.com/en/libraries/cluster-libraries.html#install-a-library-on-a-cluster
Please find an example below :
For other details, you can check this document:
https://docs.databricks.com/en/external-access/iceberg.html
Please let me know if this helps and leave a like if this information is useful, followups are appreciated.
Kudos
Ayushi
02-04-2025 12:00 AM - edited 02-04-2025 12:01 AM
Hi Ayushi,
and this is what happens when I use the fully qualified class name.
and this is my library.
What could be the issue here?
Thanks
02-24-2025 10:32 PM
To use Apache Iceberg via the Hadoop Catalog on Databricks, it was found to work with the following settings:
- Use a Databricks Runtime version of 12.2LTS or earlier.
- Set the access mode to "No isolation shared" (the mode where Unity Catalog cannot be used).
- Use a library compatible with Java 8 (i.e., an Iceberg library earlier than version 1.6.1).
- Apply the necessary Iceberg-related settings in the Spark configuration.
There is also an article (in Japanese) that explains how to resolve the errors:
- https://qiita.com/manabian/items/4c2c78c7db77f704e5ab
03-05-2025 07:31 AM
How to create/insert in databricks tables for iceberg format? I have iceberg parquets in gcs and want to store them as iceberg tables in databricks catalogs.
03-10-2025 11:26 PM
Unity Catalog does not support Iceberg tables in Databricks. One workaround is to create the Iceberg tables using a deep clone operation. However, please note that these methods do not support features such as Merge-on-Read (MoR) or partition evolution.
ref: Incrementally clone Parquet and Iceberg tables to Delta Lake | Databricks Documentation
Unfortunately, Unity Catalog does not support shallow clone too.
ref: Incrementally clone Parquet and Iceberg tables to Delta Lake | Databricks Documentation
Additionally, there is a Japanese guide that explains how to perform a deep clone on Azure Storage, which may offer useful insights:
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now