Databricks Community

HoussemBL · ‎04-14-2025

Hi everyone,

I noticed Databricks recently released the automatic liquid clustering feature, which looks very promising. I'm currently implementing a DLT pipeline and would like to leverage this new functionality.

However, I'm having trouble figuring out the correct syntax to integrate automatic liquid clustering within my DLT pipeline. I've tried the following code, but it doesn't seem to be working as expected.

dlt.create_streaming_table(
        "table_a",
        schema="""   id STRING NOT NULL,
                    description STRING NOT NULL,
                    is_current BOOLEAN NOT NULL,
        """,
        cluster_by=["auto"],
        comment="table a with automatic liquid clustering",
    )

Could someone please provide an example of the correct syntax for using automatic liquid clustering within a Databricks DLT pipeline? Any guidance or best practices would be greatly appreciated!

Thanks in advance!

notwarte · ‎04-14-2025

Hi!

I think it's worth trying the same syntax, as is shown here: https://docs.databricks.com/aws/en/delta/clustering?language=Python

notwarte · ‎04-14-2025

Also: https://community.databricks.com/t5/community-platform-discussions/cluster-by-auto-pyspark/m-p/11531...

HoussemBL · ‎04-15-2025

Thanks a lot for your reply @notwarte
I cannot really use the links that you suggest as I am implementing a DLT pipeline. The syntax of DLT Python is different especially when it comes to creating tables.

RiyazAliM · ‎04-16-2025

Hey @HoussemBL

You're correct about DLT not support Auto LC. You can assign any columns in the cluster_by but if you set it to auto, it will throw an error complaining about auto not being present in the list of columns.

Maybe, altering thee table to set/reset the LC is the only option left as of now.

Let me know your thoughts.

Cheers!

Riz

mai_luca · ‎05-08-2025

It works with SQL syntax (using CLUSTER BY AUTO), but not with pyspark.

mai_luca · ‎06-10-2025

You can now use Automatic Liquid Clustering with Python:

# Enabling Automatic Liquid Clustering on a new table
@dlt.table(cluster_by_auto=True)
def tbl_with_auto():
   return spark.range(5)

# Manually choosing a clustering key initially, followed by automatic clustering
@dlt.table(cluster_by_auto=True, cluster_by=["id"])
def tbl_with_auto_and_initial_hint():
   return spark.range(5)

HoussemBL · ‎06-13-2025

Hi @mai_luca

Still unfortunately getting an error when attempting to run your code. Here's the specific error message:

org.apache.spark.sql.AnalysisException: [CLUSTER_BY_AUTO_REQUIRES_PREDICTIVE_OPTIMIZATION] 
CLUSTER BY AUTO requires Predictive Optimization to be enabled. 
SQLSTATE: 56038

Additional context:

Predictive Optimization is enabled in our Databricks account.
According to the documentation, this feature should be automatically enabled for all workspaces, catalogs, and tables.

Is there any extra setting that should be added in DLT pipeline definition?

mai_luca · ‎06-13-2025

Hi @HoussemBL, I had the same issue. As I know, automatic Liquid Clustering on DLT in is private preview, I would suggest you to contact your sales representative to enable it 🙂

nikhilj0421 · ‎06-13-2025

@HoussemBL , you can check if PO is enabled for the target catalog in DLT.

Alex006 · ‎06-14-2025

Same issue here. I have activated PO on the specific schema where the materialized view resides per these instructions https://docs.databricks.com/aws/en/optimizations/predictive-optimization#check-whether-predictive-op...
- Doesn't help with the issue

Problem hypothesis: DLT (newly renamed to lakeflow declarative pipelines) is not creating Unity Catalog Managed Tables, which is a pre-condition for Predictive Optimization, which in turn is a pre-condition for automated liquid clustering.

Context:
- Predictive optimization is enabled on the account and the specific unity catalog schemas used
- Other tables (non-DLT created) in the schemas are Unity catalog managed (see image) and then unity catalog shows the validation in the UI.See image below:

Proof of PO being activated for the schema

Question
- Is DLT not capable of creating unity catalog managed tables?

jsturgeon · ‎08-21-2025

Is there a resolution to this? I am having the same problem. I can create tables with cluster by auto, but the MVs are failing saying I need to enable PO. This was working yesterday and is working in other environments.