Databricks Community

philsch · ‎09-09-2025

We're iceberg's java lib to write managed iceberg tables in databricks. We actually can create these tables using databricks as iceberg REST catalog. But this only works when we provide a partitioning spec. This is then picked up as cluster_columns for databricks. Unfortunately the data files we put into the partition paths (e.g. 'tables/xxx-xxx/data/_kafka_date_day=2025-05-21/xxx.parquet') remain unmaintained.

databricks duplicates the data into it's own clustering scheme.

We were told that partitioning for managed iceberg tables is unsupported. But no one could tell us how we can create a table via iceberg REST catalog properly so that it can be filtered on `__kafka_date` with correct file pruning.

Could someone provide a sample CURL to databricks for table creation that achieves this?

szymon_dybczak · ‎09-10-2025

Hi @philsch ,

Sorry, I didn't notice it. So I guess it's not possible currently. According to documentation Unity Catalog has a read-only implementation of the Iceberg REST Catalog API. So you can't use client library to create a table for now. You can only use client to read or write to a table that was already created via methods I described above

I think they will eventually add option to create table via rest catalog - it's quite new feature that was release in public preview not so long ago.

View solution in original post

WiliamRosa · ‎09-09-2025

Hi @philsch
Perhaps this documentation might help you:
https://docs.databricks.com/aws/en/external-access/iceberg

Wiliam Rosa
Data Engineer | Machine Learning Engineer
LinkedIn: linkedin.com/in/wiliamrosa

philsch · ‎09-09-2025

Unfortunately not. this document doesn't mention table creation and specifically how to create it so it uses liquid clustering since here it is mentioned that partitioning is not supported for managed iceberg tables. A table without partitioning or liquid clustering is essentially useless because any query would require a full table scan.

szymon_dybczak · ‎09-09-2025

Since as you wrote, managed iceberg tables don'tt support partitions you need to use liquid clustering. To enable liquid clustering for managed iceberg table you can use SQL syntax.

To enable liquid clustering, add the CLUSTER BY phrase to a table creation statement, as in the examples below:

CREATE TABLE table1(col0 INT, col1 string) CLUSTER BY (col0);

But keep in mind that for Apache Iceberg, you must explicitly disable deletion vectors and Row IDs when enabling Liquid Clustering on a managed Iceberg table.

To enable liquid clustering on existing table use following syntax:

-- Alter an existing table
ALTER TABLE <table_name>
CLUSTER BY (<clustering_columns>)

You can also try to use automatic liquid clustering for Unity Catalog managed Delta tables. In that case Databricks will try to intelligently choose clustering keys to optimize query performance

ALTER TABLE table1 CLUSTER BY AUTO;

Use liquid clustering for tables | Databricks on AWS

philsch · ‎09-09-2025

Thank you, but my question was specifically about creating these tables through iceberg REST catalog. The java iceberg clients doesn't issue any SQL statements nor is it capable of doing so.

szymon_dybczak · ‎09-10-2025

Hi @philsch ,

Sorry, I didn't notice it. So I guess it's not possible currently. According to documentation Unity Catalog has a read-only implementation of the Iceberg REST Catalog API. So you can't use client library to create a table for now. You can only use client to read or write to a table that was already created via methods I described above

I think they will eventually add option to create table via rest catalog - it's quite new feature that was release in public preview not so long ago.

philsch · ‎09-10-2025

Thank's for getting back. Weirdly we did successfully create the tables, but since this doesn't work without partitioning, the table would eventually end up in a weird state where the iceberg metadata and the delta metadata created alongside would deviate.
We did work around this now by creating the table via databricks API SQL endpoint. This seems to work.

Although I have to say that an iceberg table without partitioning is not worth much because partition-/file-pruning won't be possible.

szymon_dybczak · ‎09-10-2025

I think we need to add them a bit more time and this Iceberg integration will mature. Hopefully in future all of this will be available for us 🙂

liko · ‎09-26-2025

Why are you using the iceberg-core Java library instead of an existing open source Iceberg client (like Apache Spark)? Any of these can create a table with partitions when using Unity Catalog.