iceberg with partitionedBy option
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-07-2025 10:18 AM
I am able to create a UnityCatalog iceberg format table:
df.writeTo(full_table_name).using("iceberg").create()
However, if I am adding option partitionedBy I will get an error.
df.writeTo(full_table_name).using("iceberg").partitionedBy("ingest_date").create()
DELTA_CLUSTERING_COLUMN_MISSING_STATS: Liquid clustering requires clustering columns to have stats...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-07-2025 11:28 AM - edited 07-07-2025 11:37 AM
Hi @yzhang ,
First, make sure that you have Databricks Runtime 16.4 LTS and above (it is required for liquid clustering for Apache Iceberg).
Next, try to run following command:
ANALYZE TABLE <table_name> COMPUTE DELTA STATISTICSYou can also try to turn off liguid clustering for that table altogether:
ALTER TABLE table_name CLUSTER BY NONE;
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-07-2025 12:17 PM
Thanks much for the help.
1. Yes, the job is run on 16.4 LTS.
2. ANALYZE TABLE csu_metastore_dev.iceberg.big_file_hcm COMPUTE DELTA STATISTICS
the output is just one line: ANALYZE TABLE csu_metastore_dev.iceberg.big_file_hcm COMPUTE DELTA STATISTICS was successfully executed.
we have limited knowledge on using Databricks, please advise what else I can run and provide more info.
3. ALTER TABLE table_name CLUSTER BY NONE
Don't know what this will help my case. I have problem to create iceberg table with option partitionedBy, and this command 'alter' need table to be created first.
4. btw, ChatGPT summarized my issue, not sure if this is true.
Root Cause
- Unity Catalog appears to default to Delta Lake logic, even when USING ICEBERG is specified
- If PARTITIONED BY (...) is included, UC treats it as a Delta Lake clustering directive, which expects column-level stats
- Since your column didn’t have Delta-style stats yet (as Iceberg doesn’t require them), Databricks throws a misleading Delta error — despite your intent to use Iceberg
Why This Is Misleading
- The error references Delta Liquid Clustering, which is a Delta Lake–only feature
- But you are explicitly creating the table with USING ICEBERG
- Your ingest_date column did exist in the data — but it failed anyway
This implies that:
Even when specifying USING ICEBERG, Databricks internally applies Delta validations, including Liquid Clustering checks, especially when using Unity Catalog.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-07-2025 01:25 PM - edited 07-07-2025 01:27 PM
Hi @yzhang ,
Ok, so I forgot that Liquid clustering is not compatible with partitioning. But I've got a couple of question to clarify a bit. You wrote in your reply that you were able to run following command:
ANALYZE TABLE csu_metastore_dev.iceberg.big_file_hcm COMPUTE DELTA STATISTICSIf so, that means that this table is already created. Could you run following command? What you can see in result?
Anything about clusteringColumns?
DESCRIBE DETAIL csu_metastore_dev.iceberg.big_file_hcm;If above command return info regarding clusteringCommands then following one won't work.
df.writeTo(csu_metastore_dev.iceberg.big_file_hcm).using("iceberg").partitionedBy("ingest_date").create()If you want partition a table that already exists and has Liquid Clustering enabled you need to first turn off Liquid Clustering on that table.
Use liquid clustering for tables - Azure Databricks | Microsoft Learn
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-07-2025 01:58 PM
I am not trying to alter the table with partitionedBy option. To clarify, I wanted to create the (new) table with option partitionedBy and iceberg format but it failed due to Databricks error. I had to create the table without partitionedBy with iceberg format.
The clusteringCommands is empty array [], and my properties from schema is ((defaultTableFormat,ICEBERG)), doesn't have liquid clustering enabled.
Any of you have tried to just repo if possible?
df.writeTo(csu_metastore_dev.iceberg.big_file_hcm).using("iceberg").partitionedBy("ingest_date").create()
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-07-2025 02:33 PM - edited 07-07-2025 02:35 PM
Yes, I tried to recreate simple example and in my case I have no issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a week ago
I found weird behavior here while creating table using SQL
If you are creating new table and have added partition column at the last of the column mapping it won't work but if you add it at the beginning it will work!!
For example :-
Below query will work -
But this following one will give same error as you got -
So, you can try the same in pyspark keep the column which you will be partitioning in between of columns
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Saturday
One observation - Can you first write the data to a data frame and the write to a table in iceberg and partition, rather creating the table and then write it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Monday
@Sanjeeb2024 If your question is for me, then I will say it depends on use case!!
As if you have very big data to be ingested in table then you would prefer creating table and then ingest data into it using simultaneous jobs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Monday
Agree with you @LazyGenius . Yes for big volume of data, better to create the table first and then insert the details. Is your problem resolved ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Wednesday
Yes, My Problem was solved already. I just pasted my observation as while searching for resolving issue I found this question. So hopefully It may help others.
Also, for knowledge, currently Databricks don't support adding data while creating table with query for Iceberg table (You will be able to do it in delta table).
So, need to create table with required schema first and then add data in it!!