<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Cannot apply liquid clustering via DLT pipeline in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/cannot-apply-liquid-clustering-via-dlt-pipeline/m-p/120710#M46235</link>
    <description>&lt;P&gt;Hi everyone, in our project we are trying to implement liquid clustering. We are testing liquid clustering with a test table called status_update, where we need to update the status for different market IDs. We are trying to update the status_update table in parallel using the update command. spark.sql(f"update&amp;nbsp; status_update&amp;nbsp;&amp;nbsp;set status='{status}' where&amp;nbsp; mkt_id&amp;nbsp;={mkt_id}") When we run the notebook in parallel for different market IDs, we encounter a concurrency issue.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 02 Jun 2025 11:49:53 GMT</pubDate>
    <dc:creator>Anand13</dc:creator>
    <dc:date>2025-06-02T11:49:53Z</dc:date>
    <item>
      <title>Cannot apply liquid clustering via DLT pipeline</title>
      <link>https://community.databricks.com/t5/data-engineering/cannot-apply-liquid-clustering-via-dlt-pipeline/m-p/118812#M45718</link>
      <description>&lt;P&gt;I want to use liquid clustering on a materialised view created via a DLT pipeline, however, there doesn't appear to be a valid way to do this.&lt;/P&gt;&lt;P&gt;Via table properties:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97035"&gt;@Dlt&lt;/a&gt;.table(
    name="&amp;lt;table name&amp;gt;,
    comment="&amp;lt;table description",
    table_properties={
        "delta.clusterBy": "AUTO",
        ...
    }
)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;The above code produces the error:&lt;/P&gt;&lt;DIV class=""&gt;&lt;P class="lia-indent-padding-left-30px"&gt;Unknown configuration was specified: delta.clusterBy&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;DELTA_UNKNOWN_CONFIGURATIONUnknown configuration was specified: delta.clusterBy\&lt;/P&gt;&lt;/DIV&gt;&lt;P&gt;&lt;SPAN class=""&gt;Suggestion from Genie:&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97035"&gt;@Dlt&lt;/a&gt;.table(
    name="&amp;lt;table_name&amp;gt;",
    comment="&amp;lt;table description&amp;gt;",
    table_properties={
        "delta.liquidClustering.enabled": "true"
        ...
    }
)&lt;/LI-CODE&gt;&lt;DIV class=""&gt;Produces the same error:&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;P class="lia-indent-padding-left-30px"&gt;Unknown configuration was specified: delta.liquidClustering.enabled&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;DELTA_UNKNOWN_CONFIGURATION Unknown configuration was specified: delta.liquidClustering.enabled&lt;/P&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;Further suggestion from Genie is to use a CLUSTER BY clause:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;LI-CODE lang="markup"&gt;# Enable liquid clustering
spark.sql("ALTER TABLE network_banded_usage CLUSTER BY AUTO")&lt;/LI-CODE&gt;&lt;P&gt;This produces the error:&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;'${command}' is not supported in spark.sql("...") API in DLT Python. Supported command: ${supportedCommands}.&lt;BR /&gt;UNSUPPORTED_SPARK_SQL_COMMAND'${command}' is not supported in spark.sql("...") API in DLT Python. Supported command: ${supportedCommands}.&lt;/P&gt;&lt;P&gt;I think this is a bug.&amp;nbsp; Has anyone got liquid clustering enabled via DLT?&lt;/P&gt;</description>
      <pubDate>Sun, 11 May 2025 23:53:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/cannot-apply-liquid-clustering-via-dlt-pipeline/m-p/118812#M45718</guid>
      <dc:creator>TamD</dc:creator>
      <dc:date>2025-05-11T23:53:33Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot apply liquid clustering via DLT pipeline</title>
      <link>https://community.databricks.com/t5/data-engineering/cannot-apply-liquid-clustering-via-dlt-pipeline/m-p/118942#M45748</link>
      <description>&lt;DIV class="paragraph"&gt;Based on the provided context, as of now, liquid clustering cannot be enabled directly on materialized views created via a Delta Live Tables (DLT) pipeline. Attempts to set table properties such as &lt;CODE&gt;"delta.clusterBy"&lt;/CODE&gt; or &lt;CODE&gt;"delta.liquidClustering.enabled"&lt;/CODE&gt; produce errors because these configurations are not supported. Moreover, using a &lt;CODE&gt;CLUSTER BY&lt;/CODE&gt; command like &lt;CODE&gt;ALTER TABLE network_banded_usage CLUSTER BY AUTO&lt;/CODE&gt; through the &lt;CODE&gt;spark.sql()&lt;/CODE&gt; API also fails in the DLT pipeline context due to unsupported SQL commands in Python-based DLT pipelines.&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;Currently, there is no support for enabling liquid clustering directly on materialized views with syntax in DLT pipelines or associated commands. However, liquid clustering is supported for Delta Lake tables managed through DLT Preview and Current channels, with clustering occurring during DLT maintenance jobs or manually via OPTIMIZE commands. Notably, the actual clustering does not happen on write but is implemented during maintenance operations like OPTIMIZE.&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;Hope this helps, Lou.&lt;/DIV&gt;</description>
      <pubDate>Mon, 12 May 2025 18:33:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/cannot-apply-liquid-clustering-via-dlt-pipeline/m-p/118942#M45748</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2025-05-12T18:33:10Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot apply liquid clustering via DLT pipeline</title>
      <link>https://community.databricks.com/t5/data-engineering/cannot-apply-liquid-clustering-via-dlt-pipeline/m-p/119103#M45796</link>
      <description>&lt;P&gt;Thanks, &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34815"&gt;@Louis_Frolio&lt;/a&gt;&lt;/P&gt;&lt;P&gt;My understanding is that DLT only allows for materialized views and streaming tables. When you say, "liquid clustering is supported for Delta Lake tables managed through DLT Preview and Current channels", do you mean that liquid clustering is only supported for DLT streaming tables?&lt;/P&gt;&lt;P&gt;Our use case requires a MERGE, which is why I was attempting to use a mat view.&amp;nbsp; Streaming tables are APPEND only, and so are not suitable for this. This sounds like if we want to take advantage of liquid clustering (or, any kind of clustering?) for a table which will be receiving updates, we can't use DLT.&amp;nbsp; Can you confirm?&amp;nbsp;&lt;/P&gt;&lt;P&gt;I note that OPTIMIZE is meant to be taken care of by Predictive Optimization, now on by default.&lt;/P&gt;&lt;P&gt;Do you know if there are plans to allow for liquid clustering on DLT mat views in a future release?&lt;/P&gt;</description>
      <pubDate>Tue, 13 May 2025 23:17:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/cannot-apply-liquid-clustering-via-dlt-pipeline/m-p/119103#M45796</guid>
      <dc:creator>TamD</dc:creator>
      <dc:date>2025-05-13T23:17:54Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot apply liquid clustering via DLT pipeline</title>
      <link>https://community.databricks.com/t5/data-engineering/cannot-apply-liquid-clustering-via-dlt-pipeline/m-p/119184#M45813</link>
      <description>&lt;P&gt;Databricks Delta Live Tables (DLT) supports liquid clustering for both streaming tables and materialized views (MVs), not just streaming tables. This means liquid clustering is available for tables managed through DLT in both Preview and Current channels, including materialized views created via DLT pipelines&lt;/P&gt;
&lt;P&gt;As for the merge statement.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="paragraph"&gt;Merge does work with Delta Live Tables (DLT) pipelines but not as a direct statement; instead, the functionality is provided via the &lt;CODE&gt;APPLY CHANGES INTO&lt;/CODE&gt; operation. This operation serves as the equivalent of the &lt;CODE&gt;MERGE INTO&lt;/CODE&gt; command for Delta Lake tables, enabling users to process updates, inserts, and deletes from source tables.&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;Key details about &lt;CODE&gt;APPLY CHANGES INTO&lt;/CODE&gt; in DLT pipelines:&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;1. &lt;STRONG&gt;Supported Operations&lt;/STRONG&gt;: It applies &lt;CODE&gt;INSERT&lt;/CODE&gt; and &lt;CODE&gt;UPDATE&lt;/CODE&gt; events from the source dataset by matching primary keys and event sequencing to maintain data consistency. DELETE operations can also be handled using statements like &lt;CODE&gt;APPLY AS DELETE WHEN&lt;/CODE&gt; in SQL, or its Python equivalent.&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;2. &lt;STRONG&gt;Pipeline Compatibility&lt;/STRONG&gt;: The target table for &lt;CODE&gt;APPLY CHANGES INTO&lt;/CODE&gt; must be a live table and cannot be a streaming live table.&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;3. &lt;STRONG&gt;Configuration Requirement&lt;/STRONG&gt;: The operation needs to be explicitly enabled in the pipeline settings by adding and enabling the &lt;CODE&gt;applyChanges&lt;/CODE&gt; configuration.&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;This approach bypasses limitations of simple streaming table operations, which are restricted to append-only queries, making it suitable for handling use cases requiring incremental updates to tables.&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;Hope this helps, Lou.&lt;/DIV&gt;</description>
      <pubDate>Wed, 14 May 2025 13:39:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/cannot-apply-liquid-clustering-via-dlt-pipeline/m-p/119184#M45813</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2025-05-14T13:39:26Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot apply liquid clustering via DLT pipeline</title>
      <link>https://community.databricks.com/t5/data-engineering/cannot-apply-liquid-clustering-via-dlt-pipeline/m-p/119271#M45825</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/115261"&gt;@TamD&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;I was able to enable Liquid Clustering via DLT using the below Syntax.&lt;BR /&gt;Try it and let me know if you face any issues:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;import dlt

@dlt.table(
    comment="DLT TABLE WITH LC ENABLED",
    cluster_by = ["column1","more_columns"]
)
def name_of_the_table():
    df=logic_to_create_the_table
    return df  &lt;/LI-CODE&gt;</description>
      <pubDate>Thu, 15 May 2025 05:26:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/cannot-apply-liquid-clustering-via-dlt-pipeline/m-p/119271#M45825</guid>
      <dc:creator>RiyazAliM</dc:creator>
      <dc:date>2025-05-15T05:26:32Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot apply liquid clustering via DLT pipeline</title>
      <link>https://community.databricks.com/t5/data-engineering/cannot-apply-liquid-clustering-via-dlt-pipeline/m-p/119659#M45941</link>
      <description>&lt;P&gt;Thanks &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/15469"&gt;@RiyazAliM&lt;/a&gt; . I want to use cluster by auto, because the data will get queried and aggregated several different ways by different business users.&amp;nbsp; I did try your code above anyway, specifying the columns to cluster by.&amp;nbsp; The pipeline ran without error, but SHOW TBLPROPERTIES does not show that any clustering has been applied.&amp;nbsp; These are the only properties set on the table:&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;delta.autoOptimize.autoCompact&lt;BR /&gt;delta.autoOptimize.optimizeWrite&lt;BR /&gt;delta.enableChangeDataFeed&lt;BR /&gt;delta.minReaderVersion&lt;BR /&gt;delta.minWriterVersion&lt;BR /&gt;pipelines.pipelineId&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34815"&gt;@Louis_Frolio&lt;/a&gt;- if automatic liquid clustering is applied to DLT table during DLT maintenance jobs -- which I believe are managed automatically by Databricks -- when should I expect to see clustering information in the table properties?&lt;/P&gt;&lt;P&gt;Cheers!&lt;/P&gt;</description>
      <pubDate>Mon, 19 May 2025 21:46:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/cannot-apply-liquid-clustering-via-dlt-pipeline/m-p/119659#M45941</guid>
      <dc:creator>TamD</dc:creator>
      <dc:date>2025-05-19T21:46:23Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot apply liquid clustering via DLT pipeline</title>
      <link>https://community.databricks.com/t5/data-engineering/cannot-apply-liquid-clustering-via-dlt-pipeline/m-p/120642#M46215</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/115261"&gt;@TamD&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;DLT doesn't currently support automatic liquid clustering. I've tried adding clusterByAuto='true' to the table properties for my DLT pipelines, and the pipeline builds successfully.&lt;/P&gt;&lt;P&gt;However, I don't think it actually works. I feel it's just treated as a customized tag in the table properties, as I have a 300GB streaming DLT table with this setting, and there are no clustering keys chosen when I run DESCRIBE TABLE EXTENDED.&lt;/P&gt;</description>
      <pubDate>Sat, 31 May 2025 08:14:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/cannot-apply-liquid-clustering-via-dlt-pipeline/m-p/120642#M46215</guid>
      <dc:creator>Mardi_Lo</dc:creator>
      <dc:date>2025-05-31T08:14:13Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot apply liquid clustering via DLT pipeline</title>
      <link>https://community.databricks.com/t5/data-engineering/cannot-apply-liquid-clustering-via-dlt-pipeline/m-p/120710#M46235</link>
      <description>&lt;P&gt;Hi everyone, in our project we are trying to implement liquid clustering. We are testing liquid clustering with a test table called status_update, where we need to update the status for different market IDs. We are trying to update the status_update table in parallel using the update command. spark.sql(f"update&amp;nbsp; status_update&amp;nbsp;&amp;nbsp;set status='{status}' where&amp;nbsp; mkt_id&amp;nbsp;={mkt_id}") When we run the notebook in parallel for different market IDs, we encounter a concurrency issue.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 02 Jun 2025 11:49:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/cannot-apply-liquid-clustering-via-dlt-pipeline/m-p/120710#M46235</guid>
      <dc:creator>Anand13</dc:creator>
      <dc:date>2025-06-02T11:49:53Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot apply liquid clustering via DLT pipeline</title>
      <link>https://community.databricks.com/t5/data-engineering/cannot-apply-liquid-clustering-via-dlt-pipeline/m-p/120711#M46236</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Hi BigRoux&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;In our project we are trying to implement liquid clustering. We are testing liquid clustering with a test table called status_update, where we need to update the status for different market IDs. We are trying to update the status_update table in parallel using the update command. spark.sql(f"update&amp;nbsp; status_update&amp;nbsp;&amp;nbsp;set status='{status}' where&amp;nbsp; mkt_id&amp;nbsp;={mkt_id}") When we run the notebook in parallel for different market IDs, we encounter a concurrency issue.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 02 Jun 2025 11:53:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/cannot-apply-liquid-clustering-via-dlt-pipeline/m-p/120711#M46236</guid>
      <dc:creator>Anand13</dc:creator>
      <dc:date>2025-06-02T11:53:47Z</dc:date>
    </item>
  </channel>
</rss>

