cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Community Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Error on UC Liquid Clustering

RobsonNLPT
Contributor

I know we have 4 keys max on cluster by () for both z-order and partition keys. I got some issues when adding 4 keys and 1 specific key triggers that error (I was not expecting as this is about Create Table) . Stats makes sense if you need to optimize an existing clustering strategy due to changes on data profile DeltaAnalysisException: Liquid clustering requires clustering columns to have stats

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

are you sure the 4 columns you defined for LC are statistics-enabled?
By default, only the first 32 are used.  This can be overridden (up or down).

View solution in original post

6 REPLIES 6

Kaniz_Fatma
Community Manager
Community Manager

Hi @RobsonNLPT, Liquid clustering is a powerful feature available in Delta Lake, specifically designed to simplify data layout decisions and optimize query performance. 

 

Here are some key points about liquid clustering:

 

What Is Liquid Clustering Used For?

  • Databricks recommends using liquid clustering for all new Delta tables.
  • It is particularly beneficial in scenarios where:
    • Tables are often filtered by high cardinality columns.
    • Data distribution exhibits significant skew.
    • Tables grow rapidly and require ongoing maintenance and tuning.
    • Concurrent write requirements exist.
    • Access patterns evolve over time.
    • A typical partition key could result in too many or too few partitions.
  • Liquid clustering allows you to redefine clustering keys without rewriting existing data, adapting to changing analytic needs over time.

Enabling Liquid Clustering:

  • When creating a table, you must explicitly enable liquid clustering.
  • Add the CLUSTER BY phrase to your table creation statement.
  • Liquid clustering replaces traditional table partitioning and ZORDER.
  • It requires that the Azure Databricks client manages all layout and optimization operations for data in your table.
  • Once enabled, you can run OPTIMIZE jobs to incrementally cluster data.

Column Statistics and Clustering Keys:

  • You can only specify columns with statistics collected as clustering keys.
  • By default, the first 32 columns in a Delta table have statistics collected.
  • You can specify up to 4 columns as clustering keys.
  • Structured Streaming workloads do not support clustering-on-write.

Compatibility and Requirements:

  • Databricks Runtime 13.3 LTS and above is required to create, write, or optimize Delta tables with liquid clustering enabled.
  • Tables with liquid clustering support row-level concurrency in Databricks Runtime 13.3 LTS and above.
  • Row-level concurrency is generally available in Databricks Runtime 14.2 and above for all tables with deletion vectors enabled.

Example Syntax for Enabling Liquid Clustering:

  • Using SQL:-- Create an empty table CREATE TABLE table1 (col0 int, col1 string) USING DELTA CLUSTER BY (col0); -- Using a CTAS statement CREATE EXTERNAL TABLE table2 CLUSTER BY (col0) LOCATION 'table_location' AS SELECT * FROM table1; -- Using a LIKE statement to copy configurations CREATE TABLE table3 LIKE table1;
  • Note that tables created with liquid clustering enabled have specific Delta table features enabled and use specific Delta writer and reader versions.

Remember, liquid clustering empowers you to adapt your data layout dynamically, making it a valuable tool for optimizing query performance in evolving scenarios. ๐Ÿš€๐Ÿ”

 

For more details, you can refer to the official Databricks documentation on liquid clustering.

I'm aware about the feature and definitions. I've created an empty table enabled to LC using 4 keys.

I've got the error "DeltaAnalysisException: Liquid clustering requires clustering columns to have stats"

How its possible if I just created an empty table?

-werners-
Esteemed Contributor III

are you sure the 4 columns you defined for LC are statistics-enabled?
By default, only the first 32 are used.  This can be overridden (up or down).

My table has only 19 columns

That's correct. One of the cluster keys I was trying to clustering is not on top 32 positions

Thank you 

JurneeSalinas
New Contributor II

Any update?

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!