cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

iceberg with partitionedBy option

yzhang
New Contributor III

I am able to create a UnityCatalog iceberg format table:
    df.writeTo(full_table_name).using("iceberg").create()

However, if I am adding option partitionedBy I will get an error.

  df.writeTo(full_table_name).using("iceberg").partitionedBy("ingest_date").create()

DELTA_CLUSTERING_COLUMN_MISSING_STATS: Liquid clustering requires clustering columns to have stats...

5 REPLIES 5

szymon_dybczak
Esteemed Contributor III

Hi @yzhang ,

First, make sure that you have Databricks Runtime 16.4 LTS and above (it is required for liquid clustering for Apache Iceberg).

Next, try to run following command:

ANALYZE TABLE <table_name> COMPUTE DELTA STATISTICS

You can also try to turn off liguid clustering for that table altogether:

ALTER TABLE table_name CLUSTER BY NONE;

Thanks much for the help.

1. Yes, the job is run on 16.4 LTS.

2. ANALYZE TABLE csu_metastore_dev.iceberg.big_file_hcm COMPUTE DELTA STATISTICS

    the output is just one line: ANALYZE TABLE csu_metastore_dev.iceberg.big_file_hcm COMPUTE DELTA STATISTICS was successfully executed.

    we have limited knowledge on using Databricks, please advise what else I can run and provide more info.

 

3. ALTER TABLE table_name CLUSTER BY NONE

     Don't know what this will help my case. I have problem to create iceberg table with option partitionedBy, and this command 'alter' need table to be created first. 

 

4. btw, ChatGPT summarized my issue, not sure if this is true.

 

Root Cause

  • Unity Catalog appears to default to Delta Lake logic, even when USING ICEBERG is specified
  • If PARTITIONED BY (...) is included, UC treats it as a Delta Lake clustering directive, which expects column-level stats
  • Since your column didnโ€™t have Delta-style stats yet (as Iceberg doesnโ€™t require them), Databricks throws a misleading Delta error โ€” despite your intent to use Iceberg

Why This Is Misleading

  • The error references Delta Liquid Clustering, which is a Delta Lakeโ€“only feature
  • But you are explicitly creating the table with USING ICEBERG
  • Your ingest_date column did exist in the data โ€” but it failed anyway
    This implies that:
    Even when specifying USING ICEBERG,
    Databricks internally applies Delta validations, including Liquid Clustering checks, especially when using Unity Catalog.

szymon_dybczak
Esteemed Contributor III

Hi @yzhang ,

Ok, so I forgot that Liquid clustering is not compatible with partitioning. But I've got a couple of question to clarify a bit. You wrote in your reply that you were able to run following command:

ANALYZE TABLE csu_metastore_dev.iceberg.big_file_hcm COMPUTE DELTA STATISTICS

If so, that means that this table is already created. Could you run following command? What you can see in result?
Anything about clusteringColumns?

DESCRIBE DETAIL  csu_metastore_dev.iceberg.big_file_hcm;

If above command return info regarding clusteringCommands then following one won't work.

 df.writeTo(csu_metastore_dev.iceberg.big_file_hcm).using("iceberg").partitionedBy("ingest_date").create()

If you want partition a table that already exists and has Liquid Clustering enabled you need to first turn off Liquid Clustering on that table. 


Use liquid clustering for tables - Azure Databricks | Microsoft Learn

szymon_dybczak_0-1751918325511.png

 

yzhang
New Contributor III

I am not trying to alter the table with partitionedBy option. To clarify, I wanted to create the (new) table with option partitionedBy and iceberg format but it failed due to Databricks error. I had to create the table without partitionedBy with iceberg format.

The clusteringCommands is empty array [], and my properties from schema is ((defaultTableFormat,ICEBERG)), doesn't have liquid clustering enabled.

Any of you have tried to just repo if possible?

 df.writeTo(csu_metastore_dev.iceberg.big_file_hcm).using("iceberg").partitionedBy("ingest_date").create()

 

szymon_dybczak
Esteemed Contributor III

Yes, I tried to recreate simple example and in my case I have no issue.

szymon_dybczak_1-1751924096836.png

 

 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now