topic Re: iceberg with partitionedBy option in Data Engineering

iceberg with partitionedBy option

yzhang — Mon, 07 Jul 2025 17:18:05 GMT

I am able to create a UnityCatalog iceberg format table:
df.writeTo(full_table_name).using("iceberg").create()

However, if I am adding option partitionedBy I will get an error.

df.writeTo(full_table_name).using("iceberg").partitionedBy("ingest_date").create()

DELTA_CLUSTERING_COLUMN_MISSING_STATS: Liquid clustering requires clustering columns to have stats...

Re: iceberg with partitionedBy option

szymon_dybczak — Mon, 07 Jul 2025 18:37:01 GMT

Hi @yzhang ,

First, make sure that you have Databricks Runtime 16.4 LTS and above (it is required for liquid clustering for Apache Iceberg).

Next, try to run following command:

ANALYZE TABLE <table_name> COMPUTE DELTA STATISTICS

You can also try to turn off liguid clustering for that table altogether:

ALTER TABLE table_name CLUSTER BY NONE;

Re: iceberg with partitionedBy option

yzhang — Mon, 07 Jul 2025 19:17:55 GMT

Thanks much for the help.

1. Yes, the job is run on 16.4 LTS.

2. ANALYZE TABLE csu_metastore_dev.iceberg.big_file_hcm COMPUTE DELTA STATISTICS

the output is just one line: ANALYZE TABLE csu_metastore_dev.iceberg.big_file_hcm COMPUTE DELTA STATISTICS was successfully executed.

we have limited knowledge on using Databricks, please advise what else I can run and provide more info.

3. ALTER TABLE table_name CLUSTER BY NONE

Don't know what this will help my case. I have problem to create iceberg table with option partitionedBy, and this command 'alter' need table to be created first.

4. btw, ChatGPT summarized my issue, not sure if this is true.

Root Cause

Unity Catalog appears to default to Delta Lake logic, even when USING ICEBERG is specified
If PARTITIONED BY (...) is included, UC treats it as a Delta Lake clustering directive, which expects column-level stats
Since your column didn’t have Delta-style stats yet (as Iceberg doesn’t require them), Databricks throws a misleading Delta error — despite your intent to use Iceberg

Why This Is Misleading

The error references Delta Liquid Clustering, which is a Delta Lake–only feature
But you are explicitly creating the table with USING ICEBERG
Your ingest_date column did exist in the data — but it failed anyway
This implies that:
Even when specifying USING ICEBERG, Databricks internally applies Delta validations, including Liquid Clustering checks, especially when using Unity Catalog.

Re: iceberg with partitionedBy option

szymon_dybczak — Mon, 07 Jul 2025 20:27:50 GMT

Hi @yzhang ,

Ok, so I forgot that Liquid clustering is not compatible with partitioning. But I've got a couple of question to clarify a bit. You wrote in your reply that you were able to run following command:

ANALYZE TABLE csu_metastore_dev.iceberg.big_file_hcm COMPUTE DELTA STATISTICS

If so, that means that this table is already created. Could you run following command? What you can see in result?
Anything about clusteringColumns?

DESCRIBE DETAIL csu_metastore_dev.iceberg.big_file_hcm;

If above command return info regarding clusteringCommands then following one won't work.

df.writeTo(csu_metastore_dev.iceberg.big_file_hcm).using("iceberg").partitionedBy("ingest_date").create()

If you want partition a table that already exists and has Liquid Clustering enabled you need to first turn off Liquid Clustering on that table.

Use liquid clustering for tables - Azure Databricks | Microsoft Learn

Re: iceberg with partitionedBy option

yzhang — Mon, 07 Jul 2025 20:58:30 GMT

I am not trying to alter the table with partitionedBy option. To clarify, I wanted to create the (new) table with option partitionedBy and iceberg format but it failed due to Databricks error. I had to create the table without partitionedBy with iceberg format.

The clusteringCommands is empty array [], and my properties from schema is ((defaultTableFormat,ICEBERG)), doesn't have liquid clustering enabled.

Any of you have tried to just repo if possible?

 df.writeTo(csu_metastore_dev.iceberg.big_file_hcm).using("iceberg").partitionedBy("ingest_date").create()

Re: iceberg with partitionedBy option

szymon_dybczak — Mon, 07 Jul 2025 21:35:06 GMT

Yes, I tried to recreate simple example and in my case I have no issue.

Re: iceberg with partitionedBy option

LazyGenius — Fri, 02 Jan 2026 08:04:40 GMT

I found weird behavior here while creating table using SQL
If you are creating new table and have added partition column at the last of the column mapping it won't work but if you add it at the beginning it will work!!
For example :-
Below query will work -

CREATE TABLE IF NOT EXISTS schema_name.table_name

( id BIGINT,

partition_column STRING,

other_column1 DOUBLE,

other_column2 DOUBLE

)

USING ICEBERG

PARTITIONED BY (partition_column);

But this following one will give same error as you got -

CREATE TABLE IF NOT EXISTS schema_name.table_name

( id BIGINT,

other_column1 DOUBLE,

other_column2 DOUBLE,

partition_column STRING

)

USING ICEBERG

PARTITIONED BY (partition_column);

So, you can try the same in pyspark keep the column which you will be partitioning in between of columns

Re: iceberg with partitionedBy option

Sanjeeb2024 — Sun, 04 Jan 2026 06:51:42 GMT

One observation - Can you first write the data to a data frame and the write to a table in iceberg and partition, rather creating the table and then write it.

Re: iceberg with partitionedBy option

LazyGenius — Mon, 05 Jan 2026 09:13:20 GMT

@Sanjeeb2024 If your question is for me, then I will say it depends on use case!!
As if you have very big data to be ingested in table then you would prefer creating table and then ingest data into it using simultaneous jobs

Re: iceberg with partitionedBy option

Sanjeeb2024 — Mon, 05 Jan 2026 10:43:21 GMT

Agree with you @LazyGenius . Yes for big volume of data, better to create the table first and then insert the details. Is your problem resolved ?

Re: iceberg with partitionedBy option

LazyGenius — Thu, 08 Jan 2026 07:20:20 GMT

Yes, My Problem was solved already. I just pasted my observation as while searching for resolving issue I found this question. So hopefully It may help others.
Also, for knowledge, currently Databricks don't support adding data while creating table with query for Iceberg table (You will be able to do it in delta table).
So, need to create table with required schema first and then add data in it!!