<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: iceberg with partitionedBy option in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/iceberg-with-partitionedby-option/m-p/124370#M47165</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/57401"&gt;@yzhang&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Ok, so I forgot that Liquid clustering is not compatible with partitioning. But I've got a couple of question to clarify a bit. You wrote in your reply that you were able to run following command:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;ANALYZE TABLE csu_metastore_dev.iceberg.big_file_hcm COMPUTE DELTA STATISTICS&lt;/LI-CODE&gt;&lt;P&gt;If so, that means that this table is already created. Could you run following command? What you can see in result?&lt;BR /&gt;Anything about clusteringColumns?&lt;/P&gt;&lt;LI-CODE lang="python"&gt;DESCRIBE DETAIL  csu_metastore_dev.iceberg.big_file_hcm;&lt;/LI-CODE&gt;&lt;P&gt;If above command return info regarding clusteringCommands then following one won't work.&lt;/P&gt;&lt;LI-CODE lang="python"&gt; df.writeTo(csu_metastore_dev.iceberg.big_file_hcm).using("iceberg").partitionedBy("ingest_date").create()&lt;/LI-CODE&gt;&lt;P&gt;If you want partition a table that already exists and has Liquid Clustering enabled you need to first turn off Liquid Clustering on that table.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/delta/clustering" target="_blank" rel="noopener"&gt;Use liquid clustering for tables - Azure Databricks | Microsoft Learn&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="szymon_dybczak_0-1751918325511.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/18032i77E6FA51B9895080/image-size/medium?v=v2&amp;amp;px=400" role="button" title="szymon_dybczak_0-1751918325511.png" alt="szymon_dybczak_0-1751918325511.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 07 Jul 2025 20:27:50 GMT</pubDate>
    <dc:creator>szymon_dybczak</dc:creator>
    <dc:date>2025-07-07T20:27:50Z</dc:date>
    <item>
      <title>iceberg with partitionedBy option</title>
      <link>https://community.databricks.com/t5/data-engineering/iceberg-with-partitionedby-option/m-p/124348#M47159</link>
      <description>&lt;P&gt;I am able to create a UnityCatalog iceberg format table:&lt;BR /&gt;&amp;nbsp; &amp;nbsp; df.writeTo(full_table_name).using("iceberg").create()&lt;/P&gt;&lt;P&gt;However, if I am adding option partitionedBy I will get an error.&lt;/P&gt;&lt;P&gt;&amp;nbsp; df.writeTo(full_table_name).using("iceberg").partitionedBy("ingest_date").create()&lt;/P&gt;&lt;P&gt;DELTA_CLUSTERING_COLUMN_MISSING_STATS: Liquid clustering requires clustering columns to have stats...&lt;/P&gt;</description>
      <pubDate>Mon, 07 Jul 2025 17:18:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/iceberg-with-partitionedby-option/m-p/124348#M47159</guid>
      <dc:creator>yzhang</dc:creator>
      <dc:date>2025-07-07T17:18:05Z</dc:date>
    </item>
    <item>
      <title>Re: iceberg with partitionedBy option</title>
      <link>https://community.databricks.com/t5/data-engineering/iceberg-with-partitionedby-option/m-p/124349#M47160</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/57401"&gt;@yzhang&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;First, make sure that you have&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;Databricks Runtime&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;16.4 LTS and above (it is required for liquid clustering for Apache Iceberg).&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Next, try to run following command:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;ANALYZE TABLE &amp;lt;table_name&amp;gt; COMPUTE DELTA STATISTICS&lt;/LI-CODE&gt;&lt;P&gt;You can also try to turn off liguid clustering for that table altogether:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;ALTER TABLE table_name CLUSTER BY NONE;&lt;/LI-CODE&gt;</description>
      <pubDate>Mon, 07 Jul 2025 18:37:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/iceberg-with-partitionedby-option/m-p/124349#M47160</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-07-07T18:37:01Z</dc:date>
    </item>
    <item>
      <title>Re: iceberg with partitionedBy option</title>
      <link>https://community.databricks.com/t5/data-engineering/iceberg-with-partitionedby-option/m-p/124359#M47164</link>
      <description>&lt;P&gt;Thanks much for the help.&lt;/P&gt;&lt;P&gt;1. Yes, the job is run on 16.4 LTS.&lt;/P&gt;&lt;P&gt;2.&amp;nbsp;ANALYZE TABLE csu_metastore_dev.iceberg.big_file_hcm COMPUTE DELTA STATISTICS&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; the output is just one line:&amp;nbsp;ANALYZE TABLE csu_metastore_dev.iceberg.big_file_hcm COMPUTE DELTA STATISTICS was successfully executed.&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; we have limited knowledge on using Databricks, please advise what else I can run and provide more info.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;3.&amp;nbsp;ALTER TABLE table_name CLUSTER BY NONE&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;Don't know what this will help my case. I have problem to create iceberg table with option partitionedBy, and this command 'alter' need table to be created first.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;4. btw, ChatGPT summarized my issue, not sure if this is true.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Root Cause&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;Unity Catalog appears to default to Delta Lake logic, even when USING ICEBERG is specified&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;If PARTITIONED BY (...) is included, UC treats it as a Delta Lake clustering directive, which expects column-level stats&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Since your column didn’t have Delta-style stats yet (as Iceberg doesn’t require them), Databricks throws a misleading Delta error — despite your intent to use Iceberg&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN&gt;Why This Is Misleading&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;The error references &lt;/SPAN&gt;&lt;SPAN&gt;Delta Liquid Clustering&lt;/SPAN&gt;&lt;SPAN&gt;, which is a &lt;/SPAN&gt;&lt;SPAN&gt;Delta Lake–only feature&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;But you are explicitly creating the table &lt;/SPAN&gt;&lt;SPAN&gt;with USING ICEBERG&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Your ingest_date column &lt;/SPAN&gt;&lt;SPAN&gt;did exist&lt;/SPAN&gt;&lt;SPAN&gt; in the data — but it failed anyway&lt;BR /&gt;This implies that:&lt;BR /&gt;Even when specifying USING ICEBERG, &lt;/SPAN&gt;&lt;SPAN&gt;Databricks internally applies Delta validations&lt;/SPAN&gt;&lt;SPAN&gt;, including Liquid Clustering checks, &lt;/SPAN&gt;&lt;SPAN&gt;especially when using Unity Catalog&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Mon, 07 Jul 2025 19:17:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/iceberg-with-partitionedby-option/m-p/124359#M47164</guid>
      <dc:creator>yzhang</dc:creator>
      <dc:date>2025-07-07T19:17:55Z</dc:date>
    </item>
    <item>
      <title>Re: iceberg with partitionedBy option</title>
      <link>https://community.databricks.com/t5/data-engineering/iceberg-with-partitionedby-option/m-p/124370#M47165</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/57401"&gt;@yzhang&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Ok, so I forgot that Liquid clustering is not compatible with partitioning. But I've got a couple of question to clarify a bit. You wrote in your reply that you were able to run following command:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;ANALYZE TABLE csu_metastore_dev.iceberg.big_file_hcm COMPUTE DELTA STATISTICS&lt;/LI-CODE&gt;&lt;P&gt;If so, that means that this table is already created. Could you run following command? What you can see in result?&lt;BR /&gt;Anything about clusteringColumns?&lt;/P&gt;&lt;LI-CODE lang="python"&gt;DESCRIBE DETAIL  csu_metastore_dev.iceberg.big_file_hcm;&lt;/LI-CODE&gt;&lt;P&gt;If above command return info regarding clusteringCommands then following one won't work.&lt;/P&gt;&lt;LI-CODE lang="python"&gt; df.writeTo(csu_metastore_dev.iceberg.big_file_hcm).using("iceberg").partitionedBy("ingest_date").create()&lt;/LI-CODE&gt;&lt;P&gt;If you want partition a table that already exists and has Liquid Clustering enabled you need to first turn off Liquid Clustering on that table.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/delta/clustering" target="_blank" rel="noopener"&gt;Use liquid clustering for tables - Azure Databricks | Microsoft Learn&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="szymon_dybczak_0-1751918325511.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/18032i77E6FA51B9895080/image-size/medium?v=v2&amp;amp;px=400" role="button" title="szymon_dybczak_0-1751918325511.png" alt="szymon_dybczak_0-1751918325511.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 07 Jul 2025 20:27:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/iceberg-with-partitionedby-option/m-p/124370#M47165</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-07-07T20:27:50Z</dc:date>
    </item>
    <item>
      <title>Re: iceberg with partitionedBy option</title>
      <link>https://community.databricks.com/t5/data-engineering/iceberg-with-partitionedby-option/m-p/124372#M47167</link>
      <description>&lt;P&gt;I am not trying to alter the table with partitionedBy option. To clarify, I wanted to create the (new) table with option partitionedBy and iceberg format but it failed due to Databricks error. I had to create the table without partitionedBy with iceberg format.&lt;/P&gt;&lt;P&gt;The&amp;nbsp;&lt;SPAN&gt;clusteringCommands is empty array [], and my properties from schema is&amp;nbsp;((defaultTableFormat,ICEBERG)), doesn't have liquid clustering enabled.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Any of you have tried to just repo if possible?&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;PRE&gt; df.writeTo(csu_metastore_dev.iceberg.big_file_hcm).using("iceberg").partitionedBy("ingest_date").create()&lt;/PRE&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 07 Jul 2025 20:58:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/iceberg-with-partitionedby-option/m-p/124372#M47167</guid>
      <dc:creator>yzhang</dc:creator>
      <dc:date>2025-07-07T20:58:30Z</dc:date>
    </item>
    <item>
      <title>Re: iceberg with partitionedBy option</title>
      <link>https://community.databricks.com/t5/data-engineering/iceberg-with-partitionedby-option/m-p/124375#M47168</link>
      <description>&lt;P&gt;Yes, I tried to recreate simple example and in my case I have no issue.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="szymon_dybczak_1-1751924096836.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/18036i0CB894E980360B0D/image-size/medium?v=v2&amp;amp;px=400" role="button" title="szymon_dybczak_1-1751924096836.png" alt="szymon_dybczak_1-1751924096836.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 07 Jul 2025 21:35:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/iceberg-with-partitionedby-option/m-p/124375#M47168</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-07-07T21:35:06Z</dc:date>
    </item>
    <item>
      <title>Re: iceberg with partitionedBy option</title>
      <link>https://community.databricks.com/t5/data-engineering/iceberg-with-partitionedby-option/m-p/142806#M52022</link>
      <description>&lt;P&gt;I found weird behavior here while creating table using SQL&lt;BR /&gt;If you are creating new table and have added partition column at the last of the column mapping it won't work but if you add it at the beginning it will work!!&lt;BR /&gt;For example :-&lt;BR /&gt;Below query will work -&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;CREATE TABLE IF NOT EXISTS schema_name.table_name&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;( id BIGINT,&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;partition_column STRING,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;other_column1 DOUBLE,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;other_column2 DOUBLE&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;)&lt;/DIV&gt;&lt;DIV&gt;USING ICEBERG&lt;/DIV&gt;&lt;DIV&gt;PARTITIONED BY (partition_column);&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;But this following one will give same error as you got -&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;CREATE TABLE IF NOT EXISTS schema_name.table_name&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;( id BIGINT,&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;other_column1 DOUBLE,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;other_column2 DOUBLE,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;partition_column STRING&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;)&lt;/DIV&gt;&lt;DIV&gt;USING ICEBERG&lt;/DIV&gt;&lt;DIV&gt;PARTITIONED BY (partition_column);&lt;/DIV&gt;&lt;P&gt;So, you can try the same in pyspark keep the column which you will be partitioning in between of columns&lt;/P&gt;</description>
      <pubDate>Fri, 02 Jan 2026 08:04:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/iceberg-with-partitionedby-option/m-p/142806#M52022</guid>
      <dc:creator>LazyGenius</dc:creator>
      <dc:date>2026-01-02T08:04:40Z</dc:date>
    </item>
    <item>
      <title>Re: iceberg with partitionedBy option</title>
      <link>https://community.databricks.com/t5/data-engineering/iceberg-with-partitionedby-option/m-p/142918#M52051</link>
      <description>&lt;P&gt;One observation - Can you first write the data to a data frame and the write to a table in iceberg and partition, rather creating the table and then write it.&lt;/P&gt;</description>
      <pubDate>Sun, 04 Jan 2026 06:51:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/iceberg-with-partitionedby-option/m-p/142918#M52051</guid>
      <dc:creator>Sanjeeb2024</dc:creator>
      <dc:date>2026-01-04T06:51:42Z</dc:date>
    </item>
    <item>
      <title>Re: iceberg with partitionedBy option</title>
      <link>https://community.databricks.com/t5/data-engineering/iceberg-with-partitionedby-option/m-p/142994#M52067</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/129689"&gt;@Sanjeeb2024&lt;/a&gt;&amp;nbsp;If your question is for me, then I will say it depends on use case!!&lt;BR /&gt;As if you have very big data to be ingested in table then you would prefer creating table and then ingest data into it using simultaneous jobs&lt;/P&gt;</description>
      <pubDate>Mon, 05 Jan 2026 09:13:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/iceberg-with-partitionedby-option/m-p/142994#M52067</guid>
      <dc:creator>LazyGenius</dc:creator>
      <dc:date>2026-01-05T09:13:20Z</dc:date>
    </item>
    <item>
      <title>Re: iceberg with partitionedBy option</title>
      <link>https://community.databricks.com/t5/data-engineering/iceberg-with-partitionedby-option/m-p/143008#M52071</link>
      <description>&lt;P&gt;Agree with you&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/166832"&gt;@LazyGenius&lt;/a&gt;&amp;nbsp;. Yes for big volume of data, better to create the table first and then insert the details. Is your problem resolved ?&lt;/P&gt;</description>
      <pubDate>Mon, 05 Jan 2026 10:43:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/iceberg-with-partitionedby-option/m-p/143008#M52071</guid>
      <dc:creator>Sanjeeb2024</dc:creator>
      <dc:date>2026-01-05T10:43:21Z</dc:date>
    </item>
    <item>
      <title>Re: iceberg with partitionedBy option</title>
      <link>https://community.databricks.com/t5/data-engineering/iceberg-with-partitionedby-option/m-p/143293#M52140</link>
      <description>&lt;P&gt;Yes, My Problem was solved already. I just pasted my observation as while searching for resolving issue I found this question. So hopefully It may help others.&lt;BR /&gt;Also, for knowledge, currently Databricks don't support adding data while creating table with query for Iceberg table (You will be able to do it in delta table).&amp;nbsp;&lt;BR /&gt;So, need to create table with required schema first and then add data in it!!&lt;/P&gt;</description>
      <pubDate>Thu, 08 Jan 2026 07:20:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/iceberg-with-partitionedby-option/m-p/143293#M52140</guid>
      <dc:creator>LazyGenius</dc:creator>
      <dc:date>2026-01-08T07:20:20Z</dc:date>
    </item>
  </channel>
</rss>

