<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Best practices for Liquid clustering and z-ordering for existing streaming delta tables in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/best-practices-for-liquid-clustering-and-z-ordering-for-existing/m-p/89489#M37824</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/117376"&gt;@filipniziol&lt;/a&gt;&amp;nbsp;, thank you for your quick response. Not executing the alter statement against the DLT table directly and instead adjusting the pipeline configuration approach makes sense.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Regarding the liquid clustering on the DLT tables, I assume if we set the "Channel" setting to "Preview", it would enable us to apply "ALTER STATEMENT" against DLT tables perhaps or any thoughts on how we can apply liquid clustering on the existing DLT Table alternatively?&lt;BR /&gt;&lt;BR /&gt;My goal is to avoid performing refresh on DLT tables since it will require huge amounts of data writes.&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 11 Sep 2024 14:30:48 GMT</pubDate>
    <dc:creator>prakash360</dc:creator>
    <dc:date>2024-09-11T14:30:48Z</dc:date>
    <item>
      <title>Best practices for Liquid clustering and z-ordering for existing streaming delta tables</title>
      <link>https://community.databricks.com/t5/data-engineering/best-practices-for-liquid-clustering-and-z-ordering-for-existing/m-p/89382#M37775</link>
      <description>&lt;P&gt;Hello, I have been tasked to optimize some of our existing tables in delta lake in Databricks and I was able to perform following clause on some of our delta tables but I wasn't able to execute the same clause against some of our streaming tables.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;ALTER TABLE table_name
CLUSTER BY (column1, column2);&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Error that I receive when I execute above statement for our streaming tables:&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;[STREAMING_TABLE_OPERATION_NOT_ALLOWED.UNSUPPORTED_OPERATION] The operation ALTER TABLE is not allowed: The operation is not supported on Streaming Tables. SQLSTATE: 42601&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;My goal is to apply liquid clustering against all tables but seems like, we may have to build tables from scratch to apply liquid clustering on streaming table, is that right?As an alternative, I thought I should apply z-ordering against streaming table since following SQL, I was able to apply. Now thinking, what's the right way to apply z-ordering? Can I just continue executing following SQL every day or shall I apply the spark config params in my code?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;OPTIMIZE table_name ZORDER BY column1, column2;&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Or&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;pipelines.autoOptimize.managed = true
pipelines.autoOptimize.zOrderCols = "column1,column2"&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;source: &lt;A href="https://docs.databricks.com/en/delta-live-tables/properties.html" target="_blank" rel="noopener"&gt;https://docs.databricks.com/en/delta-live-tables/properties.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Appreciate the feedback and guide for best path forward to apply best possible optimization against my delta table to limit the file scans.&lt;/P&gt;</description>
      <pubDate>Tue, 10 Sep 2024 21:59:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/best-practices-for-liquid-clustering-and-z-ordering-for-existing/m-p/89382#M37775</guid>
      <dc:creator>prakash360</dc:creator>
      <dc:date>2024-09-10T21:59:21Z</dc:date>
    </item>
    <item>
      <title>Re: Best practices for Liquid clustering and z-ordering for existing streaming delta tables</title>
      <link>https://community.databricks.com/t5/data-engineering/best-practices-for-liquid-clustering-and-z-ordering-for-existing/m-p/89386#M37777</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/120202"&gt;@prakash360&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;There are a couple of things here:&lt;BR /&gt;1. You do not want to run ALTER TABLE commands on DLT. The idea of DLT is that they are fully managed by the DLT pipeline. Try to modify the DLT pipeline itself.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;2. At least try liquid clustering, as this a new, recommended feature. However, the liquid clustering in DLT was added just &lt;/SPAN&gt;&lt;A href="https://docs.databricks.com/en/release-notes/delta-live-tables/2024/33/index.html" target="_self"&gt;2 weeks ago&lt;/A&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;3. In order to make it work use preview 15.2 runtime and set channel of your DLT pipeline to &lt;A href="https://docs.databricks.com/en/delta-live-tables/properties.html#config-settings" target="_self"&gt;preview:&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="filipniziol_0-1726007101075.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/11061i9159687BBEDAC093/image-size/medium?v=v2&amp;amp;px=400" role="button" title="filipniziol_0-1726007101075.png" alt="filipniziol_0-1726007101075.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 10 Sep 2024 22:26:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/best-practices-for-liquid-clustering-and-z-ordering-for-existing/m-p/89386#M37777</guid>
      <dc:creator>filipniziol</dc:creator>
      <dc:date>2024-09-10T22:26:36Z</dc:date>
    </item>
    <item>
      <title>Re: Best practices for Liquid clustering and z-ordering for existing streaming delta tables</title>
      <link>https://community.databricks.com/t5/data-engineering/best-practices-for-liquid-clustering-and-z-ordering-for-existing/m-p/89489#M37824</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/117376"&gt;@filipniziol&lt;/a&gt;&amp;nbsp;, thank you for your quick response. Not executing the alter statement against the DLT table directly and instead adjusting the pipeline configuration approach makes sense.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Regarding the liquid clustering on the DLT tables, I assume if we set the "Channel" setting to "Preview", it would enable us to apply "ALTER STATEMENT" against DLT tables perhaps or any thoughts on how we can apply liquid clustering on the existing DLT Table alternatively?&lt;BR /&gt;&lt;BR /&gt;My goal is to avoid performing refresh on DLT tables since it will require huge amounts of data writes.&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Sep 2024 14:30:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/best-practices-for-liquid-clustering-and-z-ordering-for-existing/m-p/89489#M37824</guid>
      <dc:creator>prakash360</dc:creator>
      <dc:date>2024-09-11T14:30:48Z</dc:date>
    </item>
  </channel>
</rss>

