topic Automatic Liquid Clustering and PO in Community Articles

Automatic Liquid Clustering and PO

lucami — Tue, 10 Jun 2025 09:26:00 GMT

I spent some time to understand how to use automatic liquid clustering with dlt pipelines. Hope this can help you as well.

Enable Predictive Optimization
Use this code:

# Enabling Automatic Liquid Clustering on a new table @dlt.table(cluster_by_auto=True) def tbl_with_auto(): return spark.range(5) # Manually choosing a clustering key initially, followed by automatic clustering @dlt.table(cluster_by_auto=True, cluster_by=["id"]) def tbl_with_auto_and_initial_hint(): return spark.range(5)

or in SQL

CREATE OR REPLACE MATERIALIZED VIEW mat_view_name CLUSTER BY AUTO AS SELECT * FROM source_table;

If you know how to specify initial clustering key with Spark SQL, let me know 🙂 .

Re: Automatic Liquid Clustering and PO

Addy0_ — Tue, 10 Jun 2025 11:30:23 GMT

Thank you for sharing this!

Just to add:
To enable automatic liquid clustering & set clustering columns for an existing table

ALTER TABLE table_name CLUSTER BY (column1, column2) AUTO;

Re: Automatic Liquid Clustering and PO

lucami — Tue, 10 Jun 2025 11:51:18 GMT

Hi @Addy0_, thanks for sharing how to set it for existing table. Unfortunately, I think ALTER cannot be used with materialized view and streaming tables defined in dlt pipelines.

I was looking for something similar to

@dlt.table(cluster_by_auto=True, cluster_by=["id"])

that in sql would look like

CREATE MATERIALIZED VIEW tbl_with_auto_and_initial_hint CLUSTER BY id AUTO AS SELECT id, fare_amount FROM source_table;

But this code raises syntax error.