OPTIMIZE with liquid clustering makes filter slower than without OPTIMIZE
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-20-2024 04:07 AM
I created 15 Million records as a Delta Table and i'm running a simple filter query on that table based on one column value - which will return only one record. Because all the values on that column are unique.
Delta Table is not partitioned.
Before enabling Liquid clustering/OPTIMIZE, the query response time was less than a second.
After enabling Liquid Clustering/OPTIMIZE, the query takes 3 to 4 seconds.
If i just enabled Liquid Clustering without OPTIMIZE, then the query response time is less than a second.
What is going on here?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-21-2024 02:04 AM
is the column you query clustered by LQ or not? That could be the explanation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-21-2024 03:50 AM
Yes the column is used as Clustering Key
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-25-2024 08:15 AM
it seems that for this specific query Liquid Clustering has worse performance. It does not have better performance for all queries.
The following are examples of scenarios that benefit from clustering:
Tables often filtered by high cardinality columns.
Tables with significant skew in data distribution.
Tables that grow quickly and require maintenance and tuning effort.
Tables with concurrent write requirements.
Tables with access patterns that change over time.
Tables where a typical partition key could leave the table with too many or too few partitions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-25-2024 08:21 AM
I'm testing a scenario mentioned in the document ( nothing complex)
The filter i'm using is High Cardinality column(every record is unique) and my table is not partitioned (so this is straight forward scenario)
Tables often filtered by high cardinality columns.

