cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Issue with Auto Liquid clustering

RevanthV
New Contributor

 

  • I have written data to a table using clusterByAuto set to true
  • But the clustering keys are not selected automatically when i do a desc detail on the table.Screenshot below

    Why are clustering columns not being selected automatically?

Repro steps:

  1. Create a dataframe
  2. Write to a delta table using clusterByAuto set to true
  3. Then run desc detail on teh table .You will not see any clustering columns selected
1 ACCEPTED SOLUTION

Accepted Solutions

K_Anudeep
New Contributor III

Hello Revanth,

It could be that the 

  • The table is too small to benefit from liquid clustering, or it has a good clustering scheme. Could you tell me the size of the table?
  • Also, if the table is not being frequently queried on a column or set of columns,ie.., there are not enough scans on the table, so it wouldn't benefit from clustering
  • Also, Auto Liquid works with PO in the background, so based on some internal thresholds, the clustering columns are selected, and we as Cx need not worry about that 

So once you have frequent scans on the table columns, you will see clustering columns being selected automatically.

You can go through the doc for better understandinghttps://docs.databricks.com/aws/en/delta/clustering#:~:text=If%20a%20key%20was%20not%20selected%20by...

Let me know if you have any further questions.

View solution in original post

3 REPLIES 3

K_Anudeep
New Contributor III

Hello Revanth,

It could be that the 

  • The table is too small to benefit from liquid clustering, or it has a good clustering scheme. Could you tell me the size of the table?
  • Also, if the table is not being frequently queried on a column or set of columns,ie.., there are not enough scans on the table, so it wouldn't benefit from clustering
  • Also, Auto Liquid works with PO in the background, so based on some internal thresholds, the clustering columns are selected, and we as Cx need not worry about that 

So once you have frequent scans on the table columns, you will see clustering columns being selected automatically.

You can go through the doc for better understandinghttps://docs.databricks.com/aws/en/delta/clustering#:~:text=If%20a%20key%20was%20not%20selected%20by...

Let me know if you have any further questions.

Thanks a lot @K_Anudeep , My table is still small and i guess that was the reason ,  have now written around 1million records and have been just running frequent scans on a column since last one hour and now i can see the same column selected as a clustering column.

Thanks a lot for your help

szymon_dybczak
Esteemed Contributor III

Hi @RevanthV ,

As @K_Anudeep  correctly suggested it could be the case that your table is to small to benefit from liquid clustering.
Another possibility it that you're using runtime lower than 15.4 LTS.