โ02-03-2025 10:51 PM
Hi Databricks Community,
Iโm trying to create Apache Iceberg tables in Databricks using Parquet files stored in an S3 bucket. I found a guide from Dremio, but Iโm unable to create Iceberg tables using that method.
Hereโs what I need:
Questions:
Any step-by-step guidance or sample code would be helpful!
Thanks in advance!
โ02-03-2025 11:40 PM
Hi @messiah , Good Day!
Please follow the below steps to create an iceberg table in Databricks
1 . you have to create the iceberg table using the supported filed format which is stored in the storage location: it's defined here
it basically supported Parquet, Avro, and ORC.
2 . before creating you need to install the iceberg jar instead of the python library file on your cluster according to your cluster spark version: you can download it from here: https://iceberg.apache.org/releases/#downloads
You can follow the below document to install the downloaded Jar file on a cluster : https://docs.databricks.com/en/libraries/cluster-libraries.html#install-a-library-on-a-cluster
Please find an example below :
For other details, you can check this document:
https://docs.databricks.com/en/external-access/iceberg.html
Please let me know if this helps and leave a like if this information is useful, followups are appreciated.
Kudos
Ayushi
โ02-04-2025 12:00 AM - edited โ02-04-2025 12:01 AM
Hi Ayushi,
and this is what happens when I use the fully qualified class name.
and this is my library.
What could be the issue here?
Thanks
a month ago
To use Apache Iceberg via the Hadoop Catalog on Databricks, it was found to work with the following settings:
- Use a Databricks Runtime version of 12.2LTS or earlier.
- Set the access mode to "No isolation shared" (the mode where Unity Catalog cannot be used).
- Use a library compatible with Java 8 (i.e., an Iceberg library earlier than version 1.6.1).
- Apply the necessary Iceberg-related settings in the Spark configuration.
There is also an article (in Japanese) that explains how to resolve the errors:
- https://qiita.com/manabian/items/4c2c78c7db77f704e5ab
โ
3 weeks ago
How to create/insert in databricks tables for iceberg format? I have iceberg parquets in gcs and want to store them as iceberg tables in databricks catalogs.
2 weeks ago
Unity Catalog does not support Iceberg tables in Databricks. One workaround is to create the Iceberg tables using a deep clone operation. However, please note that these methods do not support features such as Merge-on-Read (MoR) or partition evolution.
ref: Incrementally clone Parquet and Iceberg tables to Delta Lake | Databricks Documentation
Unfortunately, Unity Catalog does not support shallow clone too.
ref: Incrementally clone Parquet and Iceberg tables to Delta Lake | Databricks Documentation
Additionally, there is a Japanese guide that explains how to perform a deep clone on Azure Storage, which may offer useful insights:
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group