cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to create a managed iceberg table via REST catalog

philsch
New Contributor

We're iceberg's java lib to write managed iceberg tables in databricks. We actually can create these tables using databricks as iceberg REST catalog. But this only works when we provide a partitioning spec. This is then picked up as cluster_columns for databricks. Unfortunately the data files we put into the partition paths (e.g. 'tables/xxx-xxx/data/_kafka_date_day=2025-05-21/xxx.parquet') remain unmaintained.

databricks duplicates the data into it's own clustering scheme. 

We were told that partitioning for managed iceberg tables is unsupported. But no one could tell us how we can create a table via iceberg REST catalog properly so that it can be filtered on `__kafka_date` with correct file pruning.

Could someone provide a sample CURL to databricks for table creation that achieves this?

1 ACCEPTED SOLUTION

Accepted Solutions

szymon_dybczak
Esteemed Contributor III

Hi @philsch ,

Sorry, I didn't notice it. So I guess it's not possible currently. According to documentation Unity Catalog has a read-only implementation of the Iceberg REST Catalog API. So you can't use client library to create a table for now. You can only use client to read or write to a table that was already created via methods I  described above

szymon_dybczak_0-1757505079888.png

I think they will eventually add option to create table via rest catalog - it's quite new feature that was release in public preview not so long ago.

View solution in original post

7 REPLIES 7

WiliamRosa
New Contributor III

Hi @philsch 
Perhaps this documentation might help you:
https://docs.databricks.com/aws/en/external-access/iceberg

Wiliam Rosa
Data Engineer | Machine Learning Engineer
LinkedIn: linkedin.com/in/wiliamrosa

Unfortunately not. this document doesn't mention table creation and specifically how to create it so it uses liquid clustering since here it is mentioned that partitioning is not supported for managed iceberg tables. A table without partitioning or liquid clustering is essentially useless because any query would require a full table scan.

szymon_dybczak
Esteemed Contributor III

Since as you wrote, managed iceberg tables don'tt support partitions you need to use liquid clustering. To enable liquid clustering for managed iceberg table you can use SQL syntax.

To enable liquid clustering, add the CLUSTER BY phrase to a table creation statement, as in the examples below:

CREATE TABLE table1(col0 INT, col1 string) CLUSTER BY (col0);

 But keep in mind that for Apache Iceberg, you must explicitly disable deletion vectors and Row IDs when enabling Liquid Clustering on a managed Iceberg table.

To enable liquid clustering on existing table use following syntax:

-- Alter an existing table
ALTER TABLE <table_name>
CLUSTER BY (<clustering_columns>)

 

You can also try to use automatic liquid clustering for Unity Catalog managed Delta tables. In that case Databricks will try to intelligently choose clustering keys to optimize query performance

ALTER TABLE table1 CLUSTER BY AUTO;

 

Use liquid clustering for tables | Databricks on AWS

Thank you, but my question was specifically about creating these tables through iceberg REST catalog. The java iceberg clients doesn't issue any SQL statements nor is it capable of doing so.

szymon_dybczak
Esteemed Contributor III

Hi @philsch ,

Sorry, I didn't notice it. So I guess it's not possible currently. According to documentation Unity Catalog has a read-only implementation of the Iceberg REST Catalog API. So you can't use client library to create a table for now. You can only use client to read or write to a table that was already created via methods I  described above

szymon_dybczak_0-1757505079888.png

I think they will eventually add option to create table via rest catalog - it's quite new feature that was release in public preview not so long ago.

Thank's for getting back. Weirdly we did successfully create the tables, but since this doesn't work without partitioning, the table would eventually end up in a weird state where the iceberg metadata and the delta metadata created alongside would deviate. 
We did work around this now by creating the table via databricks API SQL endpoint. This seems to work.

Although I have to say that an iceberg table without partitioning is not worth much because partition-/file-pruning won't be possible.

szymon_dybczak
Esteemed Contributor III

I think we need to add them a bit more time and this Iceberg integration will mature. Hopefully in future all of this will be available for us 🙂

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now