cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

DLT Pipeline & Automatic Liquid Clustering Syntax

HoussemBL
New Contributor III

Hi everyone,

I noticed Databricks recently released the automatic liquid clustering feature, which looks very promising. I'm currently implementing a DLT pipeline and would like to leverage this new functionality.

However, I'm having trouble figuring out the correct syntax to integrate automatic liquid clustering within my DLT pipeline. I've tried the following code, but it doesn't seem to be working as expected.

 

dlt.create_streaming_table(
        "table_a",
        schema="""   id STRING NOT NULL,
                    description STRING NOT NULL,
                    is_current BOOLEAN NOT NULL,
        """,
        cluster_by=["auto"],
        comment="table a with automatic liquid clustering",
    )

Could someone please provide an example of the correct syntax for using automatic liquid clustering within a Databricks DLT pipeline? Any guidance or best practices would be greatly appreciated!

Thanks in advance!

11 REPLIES 11

notwarte
New Contributor III

Hi!

I think it's worth trying the same syntax, as is shown here: https://docs.databricks.com/aws/en/delta/clustering?language=Python 

notwarte
New Contributor III

HoussemBL
New Contributor III

Thanks a lot for your reply @notwarte 
I cannot really use the links that you suggest as I am implementing a DLT pipeline. The syntax of DLT Python is different especially when it comes to creating tables.

RiyazAliM
Honored Contributor

Hey @HoussemBL 

You're correct about DLT not support Auto LC. You can assign any columns in the cluster_by but if you set it to auto, it will throw an error complaining about auto not being present in the list of columns.

Maybe, altering thee table to set/reset the LC is the only option left as of now.

Let me know your thoughts.

Cheers!

Riz

mai_luca
New Contributor III

It works with SQL syntax (using CLUSTER BY AUTO), but not with pyspark.

mai_luca
New Contributor III

You can now use Automatic Liquid Clustering with Python:

# Enabling Automatic Liquid Clustering on a new table
@dlt.table(cluster_by_auto=True)
def tbl_with_auto():
   return spark.range(5)

# Manually choosing a clustering key initially, followed by automatic clustering
@dlt.table(cluster_by_auto=True, cluster_by=["id"])
def tbl_with_auto_and_initial_hint():
   return spark.range(5)

HoussemBL
New Contributor III

Hi @mai_luca 

Still unfortunately getting an error when attempting to run your code. Here's the specific error message:

 
org.apache.spark.sql.AnalysisException: [CLUSTER_BY_AUTO_REQUIRES_PREDICTIVE_OPTIMIZATION] 
CLUSTER BY AUTO requires Predictive Optimization to be enabled.
SQLSTATE: 56038

Additional context:

  • Predictive Optimization is enabled in our Databricks account.

  • According to the documentation, this feature should be automatically enabled for all workspaces, catalogs, and tables.

Is there any extra setting that should be added in DLT pipeline definition?

mai_luca
New Contributor III

Hi @HoussemBL, I had the same issue. As I know, automatic Liquid Clustering on DLT in is private preview, I would suggest you to contact your sales representative to enable it 🙂 

nikhilj0421
Databricks Employee
Databricks Employee

@HoussemBL , you can check if PO is enabled for the target catalog in DLT. 

Alex006
Contributor

Same issue here. I have activated PO on the specific schema where the materialized view resides per these instructions https://docs.databricks.com/aws/en/optimizations/predictive-optimization#check-whether-predictive-op...
- Doesn't help with the issue

Problem hypothesis: DLT (newly renamed to lakeflow declarative pipelines) is not creating Unity Catalog Managed Tables, which is a pre-condition for Predictive Optimization, which in turn is a pre-condition for automated liquid clustering. 

Context:
- Predictive optimization is enabled on the account and the specific unity catalog schemas used
- Other tables (non-DLT created) in the schemas are Unity catalog managed (see image) and then unity catalog shows the validation in the UI.See image below:

Alex006_0-1749898948011.png

Proof of PO being activated for the schema

Alex006_1-1749899682047.png

 

Question
- Is DLT not capable of creating unity catalog managed tables?



jsturgeon
New Contributor II

Is there a resolution to this? I am having the same problem. I can create tables with cluster by auto, but the MVs are failing saying I need to enable PO. This was working yesterday and is working in other environments.