<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Liquid clustering with boolean columns in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/liquid-clustering-with-boolean-columns/m-p/50643#M28854</link>
    <description>&lt;P&gt;Hi community &lt;span class="lia-unicode-emoji" title=":waving_hand:"&gt;👋&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Is it possible to use boolean columns as cluster keys for liquid clustering on Delta Tables?&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;I've been trying to set a boolean column as cluster key since it's one of my most common queries when reading from the table. I'm getting the error "&lt;SPAN&gt;&lt;EM&gt;DeltaAnalysisException: Liquid clustering requires clustering columns to have stats. Couldn't find clustering column *boolean_column_name* in stats schema&lt;/EM&gt;" - which doesn't make sense since I've made sure to collect statistics on all columns (&lt;/SPAN&gt;&lt;SPAN&gt;delta.dataSkippingNumIndexedCols&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;-1&lt;/SPAN&gt;&lt;SPAN&gt;) and can see stats on the boolean columns as well.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I came across &lt;A href="https://milescole.dev/optimization/2023/10/08/Delta-Table-Maintenance-101.html" target="_blank" rel="noopener"&gt;this post&lt;/A&gt; by Miles Cole, saying that it's not possible to have b&lt;/SPAN&gt;oolean data types as clustering keys but it's not stated anywhere in the official documentation.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any input is appreciated!&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 08 Nov 2023 12:53:03 GMT</pubDate>
    <dc:creator>Karin</dc:creator>
    <dc:date>2023-11-08T12:53:03Z</dc:date>
    <item>
      <title>Liquid clustering with boolean columns</title>
      <link>https://community.databricks.com/t5/data-engineering/liquid-clustering-with-boolean-columns/m-p/50643#M28854</link>
      <description>&lt;P&gt;Hi community &lt;span class="lia-unicode-emoji" title=":waving_hand:"&gt;👋&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Is it possible to use boolean columns as cluster keys for liquid clustering on Delta Tables?&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;I've been trying to set a boolean column as cluster key since it's one of my most common queries when reading from the table. I'm getting the error "&lt;SPAN&gt;&lt;EM&gt;DeltaAnalysisException: Liquid clustering requires clustering columns to have stats. Couldn't find clustering column *boolean_column_name* in stats schema&lt;/EM&gt;" - which doesn't make sense since I've made sure to collect statistics on all columns (&lt;/SPAN&gt;&lt;SPAN&gt;delta.dataSkippingNumIndexedCols&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;-1&lt;/SPAN&gt;&lt;SPAN&gt;) and can see stats on the boolean columns as well.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I came across &lt;A href="https://milescole.dev/optimization/2023/10/08/Delta-Table-Maintenance-101.html" target="_blank" rel="noopener"&gt;this post&lt;/A&gt; by Miles Cole, saying that it's not possible to have b&lt;/SPAN&gt;oolean data types as clustering keys but it's not stated anywhere in the official documentation.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any input is appreciated!&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Nov 2023 12:53:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/liquid-clustering-with-boolean-columns/m-p/50643#M28854</guid>
      <dc:creator>Karin</dc:creator>
      <dc:date>2023-11-08T12:53:03Z</dc:date>
    </item>
    <item>
      <title>Re: Liquid clustering with boolean columns</title>
      <link>https://community.databricks.com/t5/data-engineering/liquid-clustering-with-boolean-columns/m-p/55194#M30264</link>
      <description>&lt;P&gt;Can confirm that boolean columns are note allowed for liquid clustering. This seems to be undocumented and the error message is not helpful: "couldn't find clustering column in stats schema"&lt;/P&gt;</description>
      <pubDate>Wed, 13 Dec 2023 13:18:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/liquid-clustering-with-boolean-columns/m-p/55194#M30264</guid>
      <dc:creator>jeroenvs</dc:creator>
      <dc:date>2023-12-13T13:18:05Z</dc:date>
    </item>
    <item>
      <title>Re: Liquid clustering with boolean columns</title>
      <link>https://community.databricks.com/t5/data-engineering/liquid-clustering-with-boolean-columns/m-p/131438#M49087</link>
      <description>&lt;P&gt;I saw that boolean columns are not supported by liquid clustering, too.&amp;nbsp; Why?&amp;nbsp; In any case:&lt;/P&gt;&lt;P&gt;By now there is an &lt;A href="https://docs.databricks.com/aws/en/error-messages/error-classes#delta_clustering_columns_datatype_not_supported" target="_self"&gt;error that can get raised&lt;/A&gt; called DELTA_CLUSTERING_COLUMNS_DATATYPE_NOT_SUPPORTED.&lt;/P&gt;&lt;P&gt;There is also by now&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/delta/clustering#choose-clustering-keys" target="_self"&gt;documentation of column datatypes that are supported by liquid clustering&lt;/A&gt;:&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;&lt;BLOCKQUOTE&gt;&lt;P&gt;Clustering supports the following data types for clustering keys:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Date&lt;/LI&gt;&lt;LI&gt;Timestamp&lt;/LI&gt;&lt;LI&gt;TimestampNTZ (requires Databricks Runtime 14.3 LTS or above)&lt;/LI&gt;&lt;LI&gt;String&lt;/LI&gt;&lt;LI&gt;Integer&lt;/LI&gt;&lt;LI&gt;Long&lt;/LI&gt;&lt;LI&gt;Short&lt;/LI&gt;&lt;LI&gt;Float&lt;/LI&gt;&lt;LI&gt;Double&lt;/LI&gt;&lt;LI&gt;Decimal&lt;/LI&gt;&lt;LI&gt;Byte&lt;/LI&gt;&lt;/UL&gt;&lt;/BLOCKQUOTE&gt;</description>
      <pubDate>Tue, 09 Sep 2025 16:15:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/liquid-clustering-with-boolean-columns/m-p/131438#M49087</guid>
      <dc:creator>SFDataEng</dc:creator>
      <dc:date>2025-09-09T16:15:49Z</dc:date>
    </item>
  </channel>
</rss>

