topic Re: DLT Pipeline & Automatic Liquid Clustering Syntax in Data Engineering

DLT Pipeline & Automatic Liquid Clustering Syntax

HoussemBL — Mon, 14 Apr 2025 07:26:26 GMT

Hi everyone,

I noticed Databricks recently released the automatic liquid clustering feature, which looks very promising. I'm currently implementing a DLT pipeline and would like to leverage this new functionality.

However, I'm having trouble figuring out the correct syntax to integrate automatic liquid clustering within my DLT pipeline. I've tried the following code, but it doesn't seem to be working as expected.

dlt.create_streaming_table( "table_a", schema=""" id STRING NOT NULL, description STRING NOT NULL, is_current BOOLEAN NOT NULL, """, cluster_by=["auto"], comment="table a with automatic liquid clustering", )

Could someone please provide an example of the correct syntax for using automatic liquid clustering within a Databricks DLT pipeline? Any guidance or best practices would be greatly appreciated!

Thanks in advance!

Re: DLT Pipeline & Automatic Liquid Clustering Syntax

notwarte — Mon, 14 Apr 2025 14:53:41 GMT

Hi!

I think it's worth trying the same syntax, as is shown here: https://docs.databricks.com/aws/en/delta/clustering?language=Python

Re: DLT Pipeline & Automatic Liquid Clustering Syntax

notwarte — Mon, 14 Apr 2025 15:55:53 GMT

Also: https://community.databricks.com/t5/community-platform-discussions/cluster-by-auto-pyspark/m-p/115310#M9863

Re: DLT Pipeline & Automatic Liquid Clustering Syntax

HoussemBL — Tue, 15 Apr 2025 09:13:30 GMT

Thanks a lot for your reply @notwarte
I cannot really use the links that you suggest as I am implementing a DLT pipeline. The syntax of DLT Python is different especially when it comes to creating tables.

Re: DLT Pipeline & Automatic Liquid Clustering Syntax

RiyazAliM — Wed, 16 Apr 2025 16:51:11 GMT

Hey @HoussemBL

You're correct about DLT not support Auto LC. You can assign any columns in the cluster_by but if you set it to auto, it will throw an error complaining about auto not being present in the list of columns.

Maybe, altering thee table to set/reset the LC is the only option left as of now.

Let me know your thoughts.

Cheers!

Re: DLT Pipeline & Automatic Liquid Clustering Syntax

lucami — Thu, 08 May 2025 09:30:19 GMT

It works with SQL syntax (using CLUSTER BY AUTO), but not with pyspark.

Re: DLT Pipeline & Automatic Liquid Clustering Syntax

lucami — Tue, 10 Jun 2025 08:51:29 GMT

You can now use Automatic Liquid Clustering with Python:

# Enabling Automatic Liquid Clustering on a new table @dlt.table(cluster_by_auto=True) def tbl_with_auto(): return spark.range(5) # Manually choosing a clustering key initially, followed by automatic clustering @dlt.table(cluster_by_auto=True, cluster_by=["id"]) def tbl_with_auto_and_initial_hint(): return spark.range(5)

Re: DLT Pipeline & Automatic Liquid Clustering Syntax

HoussemBL — Fri, 13 Jun 2025 10:44:52 GMT

Hi @lucami

Still unfortunately getting an error when attempting to run your code. Here's the specific error message:

org.apache.spark.sql.AnalysisException: [CLUSTER_BY_AUTO_REQUIRES_PREDICTIVE_OPTIMIZATION] 
CLUSTER BY AUTO requires Predictive Optimization to be enabled. 
SQLSTATE: 56038

Additional context:

Predictive Optimization is enabled in our Databricks account.
According to the documentation, this feature should be automatically enabled for all workspaces, catalogs, and tables.

Is there any extra setting that should be added in DLT pipeline definition?

Re: DLT Pipeline & Automatic Liquid Clustering Syntax

lucami — Fri, 13 Jun 2025 11:30:57 GMT

Hi @HoussemBL, I had the same issue. As I know, automatic Liquid Clustering on DLT in is private preview, I would suggest you to contact your sales representative to enable it 🙂

Re: DLT Pipeline & Automatic Liquid Clustering Syntax

nikhilj0421 — Fri, 13 Jun 2025 13:03:04 GMT

@HoussemBL , you can check if PO is enabled for the target catalog in DLT.

Re: DLT Pipeline & Automatic Liquid Clustering Syntax

Alex006 — Sat, 14 Jun 2025 11:15:01 GMT

Same issue here. I have activated PO on the specific schema where the materialized view resides per these instructions https://docs.databricks.com/aws/en/optimizations/predictive-optimization#check-whether-predictive-optimization-is-enabled
- Doesn't help with the issue

Problem hypothesis: DLT (newly renamed to lakeflow declarative pipelines) is not creating Unity Catalog Managed Tables, which is a pre-condition for Predictive Optimization, which in turn is a pre-condition for automated liquid clustering.

Context:
- Predictive optimization is enabled on the account and the specific unity catalog schemas used
- Other tables (non-DLT created) in the schemas are Unity catalog managed (see image) and then unity catalog shows the validation in the UI.See image below:

Proof of PO being activated for the schema

Question
- Is DLT not capable of creating unity catalog managed tables?

Re: DLT Pipeline & Automatic Liquid Clustering Syntax

jsturgeon — Thu, 21 Aug 2025 16:35:47 GMT

Is there a resolution to this? I am having the same problem. I can create tables with cluster by auto, but the MVs are failing saying I need to enable PO. This was working yesterday and is working in other environments.