<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Cluster by auto pyspark in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/cluster-by-auto-pyspark/m-p/138181#M10979</link>
    <description>&lt;P&gt;&lt;SPAN&gt;This is supported now for DBR 16.4+ for both DataframeWriterV1 and DataframeWriterV2 APIs, and also for DLT, and DataStreaming APIs. More details are here:&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://docs.databricks.com/aws/en/delta/clustering" target="_blank" rel="nofollow noopener noreferrer"&gt;https://docs.databricks.com/aws/en/delta/clustering&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;. Basically using the option, `&lt;/SPAN&gt;&lt;SPAN&gt;.option("clusterBy.auto", "true")`&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 07 Nov 2025 20:43:21 GMT</pubDate>
    <dc:creator>parimarjan</dc:creator>
    <dc:date>2025-11-07T20:43:21Z</dc:date>
    <item>
      <title>Cluster by auto pyspark</title>
      <link>https://community.databricks.com/t5/get-started-discussions/cluster-by-auto-pyspark/m-p/115251#M9333</link>
      <description>&lt;P&gt;I can find documentation to enable automatic liquid clustering with SQL code: CLUSTER BY AUTO. But how do I do this with Pyspark? I know I can do it with spark.sql("ALTER TABLE CLUSTER BY AUTO") but ideally I want to pass it as an .option().&lt;/P&gt;&lt;P&gt;Thanks in advance.&lt;/P&gt;</description>
      <pubDate>Fri, 11 Apr 2025 09:40:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/cluster-by-auto-pyspark/m-p/115251#M9333</guid>
      <dc:creator>htd350</dc:creator>
      <dc:date>2025-04-11T09:40:13Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster by auto pyspark</title>
      <link>https://community.databricks.com/t5/get-started-discussions/cluster-by-auto-pyspark/m-p/115310#M9334</link>
      <description>&lt;P&gt;To enable automatic liquid clustering with PySpark and pass it as an `.option()` during table creation or modification, you currently cannot directly use a `.clusterBy("AUTO")` method in PySpark's `DataFrameWriter` API. However, there are workarounds:&lt;/P&gt;
&lt;P&gt;1. Using SQL via `spark.sql()`&lt;BR /&gt;The simplest way to enable automatic liquid clustering is by executing an SQL statement:&lt;BR /&gt;```python&lt;BR /&gt;spark.sql("ALTER TABLE table_name CLUSTER BY AUTO")&lt;BR /&gt;```&lt;BR /&gt;This enables automatic liquid clustering on an existing Delta table.&lt;/P&gt;
&lt;P&gt;2. Using the DeltaTableBuilder API&lt;BR /&gt;If you're creating a new table programmatically, you can use the DeltaTableBuilder API in PySpark to specify clustering options:&lt;BR /&gt;```python&lt;BR /&gt;from delta.tables import DeltaTable&lt;/P&gt;
&lt;P&gt;DeltaTable.create(spark) \&lt;BR /&gt;.tableName("table_name") \&lt;BR /&gt;.addColumn("col1", "STRING") \&lt;BR /&gt;.addColumn("col2", "INT") \&lt;BR /&gt;.property("delta.autoOptimize.optimizeWrite", "true") \&lt;BR /&gt;.property("delta.autoOptimize.autoCompact", "true") \&lt;BR /&gt;.property("delta.clusterBy.auto", "true") \&lt;BR /&gt;.execute()&lt;BR /&gt;```&lt;BR /&gt;Here, `.property("delta.clusterBy.auto", "true")` ensures that automatic liquid clustering is enabled.&lt;/P&gt;
&lt;P&gt;3. Using `DataFrameWriterV2` for Table Creation&lt;BR /&gt;If you're creating a table from an existing DataFrame, you can use the `DataFrameWriterV2` API:&lt;BR /&gt;```python&lt;BR /&gt;df.writeTo("table_name") \&lt;BR /&gt;.using("delta") \&lt;BR /&gt;.option("clusterBy.auto", "true") \&lt;BR /&gt;.create()&lt;BR /&gt;```&lt;BR /&gt;This approach allows you to specify the `clusterBy.auto` option directly during the write operation.&lt;/P&gt;
&lt;P&gt;Important Notes&lt;BR /&gt;- Automatic liquid clustering requires Databricks Runtime 15.4 LTS or higher.&lt;BR /&gt;- Ensure your table is managed by Unity Catalog if using automatic clustering.&lt;BR /&gt;- For existing tables, clustering does not apply retroactively to old data unless you run `OPTIMIZE FULL`.&lt;/P&gt;</description>
      <pubDate>Fri, 11 Apr 2025 16:24:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/cluster-by-auto-pyspark/m-p/115310#M9334</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2025-04-11T16:24:37Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster by auto pyspark</title>
      <link>https://community.databricks.com/t5/get-started-discussions/cluster-by-auto-pyspark/m-p/116573#M9888</link>
      <description>&lt;P&gt;How about if i use&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97035"&gt;@Dlt&lt;/a&gt;&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;table&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;is it possible to configure the automatic liquid clustering in the &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;table_properties?&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 25 Apr 2025 12:35:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/cluster-by-auto-pyspark/m-p/116573#M9888</guid>
      <dc:creator>claudiayuan</dc:creator>
      <dc:date>2025-04-25T12:35:34Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster by auto pyspark</title>
      <link>https://community.databricks.com/t5/get-started-discussions/cluster-by-auto-pyspark/m-p/116594#M9889</link>
      <description>&lt;P&gt;Not at the moment.&amp;nbsp; You have to use the SQL DDL commands either at table creation or via alter table command. Hope this help, Louis.&lt;/P&gt;</description>
      <pubDate>Fri, 25 Apr 2025 15:28:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/cluster-by-auto-pyspark/m-p/116594#M9889</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2025-04-25T15:28:06Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster by auto pyspark</title>
      <link>https://community.databricks.com/t5/get-started-discussions/cluster-by-auto-pyspark/m-p/138180#M10978</link>
      <description>&lt;P&gt;This is supported now for DBR 16.4+ for both DataframeWriterV1 and DataframeWriterV2 APIs, and also for DLT, and DataStreaming APIs. More details are here:&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/delta/clustering" target="_blank"&gt;https://docs.databricks.com/aws/en/delta/clustering&lt;/A&gt;&amp;nbsp;. Basically using the option, `&lt;SPAN&gt;.option("clusterBy.auto", "true")`&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 07 Nov 2025 20:42:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/cluster-by-auto-pyspark/m-p/138180#M10978</guid>
      <dc:creator>parimarjan</dc:creator>
      <dc:date>2025-11-07T20:42:34Z</dc:date>
    </item>
    <item>
      <title>Re: Cluster by auto pyspark</title>
      <link>https://community.databricks.com/t5/get-started-discussions/cluster-by-auto-pyspark/m-p/138181#M10979</link>
      <description>&lt;P&gt;&lt;SPAN&gt;This is supported now for DBR 16.4+ for both DataframeWriterV1 and DataframeWriterV2 APIs, and also for DLT, and DataStreaming APIs. More details are here:&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://docs.databricks.com/aws/en/delta/clustering" target="_blank" rel="nofollow noopener noreferrer"&gt;https://docs.databricks.com/aws/en/delta/clustering&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;. Basically using the option, `&lt;/SPAN&gt;&lt;SPAN&gt;.option("clusterBy.auto", "true")`&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 07 Nov 2025 20:43:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/cluster-by-auto-pyspark/m-p/138181#M10979</guid>
      <dc:creator>parimarjan</dc:creator>
      <dc:date>2025-11-07T20:43:21Z</dc:date>
    </item>
  </channel>
</rss>

