04-14-2025 12:26 AM
Hi everyone,
I noticed Databricks recently released the automatic liquid clustering feature, which looks very promising. I'm currently implementing a DLT pipeline and would like to leverage this new functionality.
However, I'm having trouble figuring out the correct syntax to integrate automatic liquid clustering within my DLT pipeline. I've tried the following code, but it doesn't seem to be working as expected.
dlt.create_streaming_table(
"table_a",
schema=""" id STRING NOT NULL,
description STRING NOT NULL,
is_current BOOLEAN NOT NULL,
""",
cluster_by=["auto"],
comment="table a with automatic liquid clustering",
)
Could someone please provide an example of the correct syntax for using automatic liquid clustering within a Databricks DLT pipeline? Any guidance or best practices would be greatly appreciated!
Thanks in advance!
04-14-2025 07:53 AM
Hi!
I think it's worth trying the same syntax, as is shown here: https://docs.databricks.com/aws/en/delta/clustering?language=Python
04-14-2025 08:55 AM
04-15-2025 02:13 AM
Thanks a lot for your reply @notwarte
I cannot really use the links that you suggest as I am implementing a DLT pipeline. The syntax of DLT Python is different especially when it comes to creating tables.
04-16-2025 09:51 AM
Hey @HoussemBL
You're correct about DLT not support Auto LC. You can assign any columns in the cluster_by but if you set it to auto, it will throw an error complaining about auto not being present in the list of columns.
Maybe, altering thee table to set/reset the LC is the only option left as of now.
Let me know your thoughts.
Cheers!
05-08-2025 02:30 AM
It works with SQL syntax (using CLUSTER BY AUTO), but not with pyspark.
06-10-2025 01:51 AM
You can now use Automatic Liquid Clustering with Python:
# Enabling Automatic Liquid Clustering on a new table
@dlt.table(cluster_by_auto=True)
def tbl_with_auto():
return spark.range(5)
# Manually choosing a clustering key initially, followed by automatic clustering
@dlt.table(cluster_by_auto=True, cluster_by=["id"])
def tbl_with_auto_and_initial_hint():
return spark.range(5)
06-13-2025 03:44 AM
Hi @mai_luca
Still unfortunately getting an error when attempting to run your code. Here's the specific error message:
org.apache.spark.sql.AnalysisException: [CLUSTER_BY_AUTO_REQUIRES_PREDICTIVE_OPTIMIZATION]
CLUSTER BY AUTO requires Predictive Optimization to be enabled.
SQLSTATE: 56038
Additional context:
Predictive Optimization is enabled in our Databricks account.
According to the documentation, this feature should be automatically enabled for all workspaces, catalogs, and tables.
Is there any extra setting that should be added in DLT pipeline definition?
06-13-2025 04:30 AM
Hi @HoussemBL, I had the same issue. As I know, automatic Liquid Clustering on DLT in is private preview, I would suggest you to contact your sales representative to enable it 🙂
06-13-2025 06:03 AM
@HoussemBL , you can check if PO is enabled for the target catalog in DLT.
06-14-2025 04:06 AM - edited 06-14-2025 04:15 AM
Same issue here. I have activated PO on the specific schema where the materialized view resides per these instructions https://docs.databricks.com/aws/en/optimizations/predictive-optimization#check-whether-predictive-op...
- Doesn't help with the issue
Problem hypothesis: DLT (newly renamed to lakeflow declarative pipelines) is not creating Unity Catalog Managed Tables, which is a pre-condition for Predictive Optimization, which in turn is a pre-condition for automated liquid clustering.
Context:
- Predictive optimization is enabled on the account and the specific unity catalog schemas used
- Other tables (non-DLT created) in the schemas are Unity catalog managed (see image) and then unity catalog shows the validation in the UI.See image below:
Proof of PO being activated for the schema
Question
- Is DLT not capable of creating unity catalog managed tables?
2 weeks ago
Is there a resolution to this? I am having the same problem. I can create tables with cluster by auto, but the MVs are failing saying I need to enable PO. This was working yesterday and is working in other environments.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now