<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Unsupported datatype 'TimestampNTZType' with liquid clustering in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/unsupported-datatype-timestampntztype-with-liquid-clustering/m-p/55764#M30428</link>
    <description>&lt;P&gt;I'm experimenting with liquid clustering and have some questions about compatible types&amp;nbsp; (somewhat similar to &lt;LI-MESSAGE title="Liquid clustering with boolean columns" uid="50643" url="https://community.databricks.com/t5/data-engineering/liquid-clustering-with-boolean-columns/m-p/50643#U50643" discussion_style_icon_css="lia-mention-container-editor-message lia-img-icon-forum-thread lia-fa-icon lia-fa-forum lia-fa-thread lia-fa"&gt;&lt;/LI-MESSAGE&gt; ).&lt;/P&gt;&lt;P&gt;Table created as&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;CREATE TABLE IF NOT EXISTS &amp;lt;TABLE&amp;gt;
(  
    _time DOUBLE
  , timestamp TIMESTAMP_NTZ
  , aid STRING
  , aip STRING
  , cid STRING
  , TargetProcessId BIGINT
)
USING delta CLUSTER BY (timestamp,aid,TargetProcessId) LOCATION &amp;lt;LOCATION&amp;gt;&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Table is filled via .writeStream() so I'm under the impression that data won't be clustered on write.&amp;nbsp;&amp;nbsp; Therefore, I run&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;OPTIMIZE &amp;lt;TABLE&amp;gt;&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;which results in (with "spark_version": "14.2.x-cpu-ml-scala2.12"):&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;IllegalArgumentException: Unsupported datatype 'TimestampNTZType'
com.databricks.backend.common.rpc.SparkDriverExceptions$SQLExecutionException: java.lang.IllegalArgumentException: Unsupported datatype 'TimestampNTZType'
	at com.databricks.sql.io.skipping.Classifier$.getStatsForCol(Classifier.scala:180)
	at com.databricks.sql.io.skipping.SimpleClassifier.applyActionToIntersectingBinaryNode(Classifier.scala:424)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I was able to create the table with the CLUSTER BY command so I assumed the column list was acceptable,&amp;nbsp; but this exception indicates otherwise.&amp;nbsp;&amp;nbsp; If TimestampNTZType is an unsupported datatype,&amp;nbsp; where is this documented?&lt;/P&gt;&lt;P&gt;When I run DESCRIBE on the table, it indicates the cluster columns were accepted:&lt;/P&gt;&lt;TABLE width="783px"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="81.9px"&gt;format&lt;/TD&gt;&lt;TD width="190.75px"&gt;partitionColumns&lt;/TD&gt;&lt;TD width="398.5px"&gt;clusteringColumns&lt;/TD&gt;&lt;TD width="110.85px"&gt;numFiles&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="81.9px"&gt;delta&lt;/TD&gt;&lt;TD width="190.75px"&gt;[]&lt;/TD&gt;&lt;TD width="398.5px"&gt;["timestamp","aid","TargetProcessId"]&lt;/TD&gt;&lt;TD width="110.85px"&gt;382&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;</description>
    <pubDate>Tue, 26 Dec 2023 22:55:57 GMT</pubDate>
    <dc:creator>hukel</dc:creator>
    <dc:date>2023-12-26T22:55:57Z</dc:date>
    <item>
      <title>Unsupported datatype 'TimestampNTZType' with liquid clustering</title>
      <link>https://community.databricks.com/t5/data-engineering/unsupported-datatype-timestampntztype-with-liquid-clustering/m-p/55764#M30428</link>
      <description>&lt;P&gt;I'm experimenting with liquid clustering and have some questions about compatible types&amp;nbsp; (somewhat similar to &lt;LI-MESSAGE title="Liquid clustering with boolean columns" uid="50643" url="https://community.databricks.com/t5/data-engineering/liquid-clustering-with-boolean-columns/m-p/50643#U50643" discussion_style_icon_css="lia-mention-container-editor-message lia-img-icon-forum-thread lia-fa-icon lia-fa-forum lia-fa-thread lia-fa"&gt;&lt;/LI-MESSAGE&gt; ).&lt;/P&gt;&lt;P&gt;Table created as&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;CREATE TABLE IF NOT EXISTS &amp;lt;TABLE&amp;gt;
(  
    _time DOUBLE
  , timestamp TIMESTAMP_NTZ
  , aid STRING
  , aip STRING
  , cid STRING
  , TargetProcessId BIGINT
)
USING delta CLUSTER BY (timestamp,aid,TargetProcessId) LOCATION &amp;lt;LOCATION&amp;gt;&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Table is filled via .writeStream() so I'm under the impression that data won't be clustered on write.&amp;nbsp;&amp;nbsp; Therefore, I run&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;OPTIMIZE &amp;lt;TABLE&amp;gt;&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;which results in (with "spark_version": "14.2.x-cpu-ml-scala2.12"):&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;IllegalArgumentException: Unsupported datatype 'TimestampNTZType'
com.databricks.backend.common.rpc.SparkDriverExceptions$SQLExecutionException: java.lang.IllegalArgumentException: Unsupported datatype 'TimestampNTZType'
	at com.databricks.sql.io.skipping.Classifier$.getStatsForCol(Classifier.scala:180)
	at com.databricks.sql.io.skipping.SimpleClassifier.applyActionToIntersectingBinaryNode(Classifier.scala:424)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I was able to create the table with the CLUSTER BY command so I assumed the column list was acceptable,&amp;nbsp; but this exception indicates otherwise.&amp;nbsp;&amp;nbsp; If TimestampNTZType is an unsupported datatype,&amp;nbsp; where is this documented?&lt;/P&gt;&lt;P&gt;When I run DESCRIBE on the table, it indicates the cluster columns were accepted:&lt;/P&gt;&lt;TABLE width="783px"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="81.9px"&gt;format&lt;/TD&gt;&lt;TD width="190.75px"&gt;partitionColumns&lt;/TD&gt;&lt;TD width="398.5px"&gt;clusteringColumns&lt;/TD&gt;&lt;TD width="110.85px"&gt;numFiles&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="81.9px"&gt;delta&lt;/TD&gt;&lt;TD width="190.75px"&gt;[]&lt;/TD&gt;&lt;TD width="398.5px"&gt;["timestamp","aid","TargetProcessId"]&lt;/TD&gt;&lt;TD width="110.85px"&gt;382&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;</description>
      <pubDate>Tue, 26 Dec 2023 22:55:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unsupported-datatype-timestampntztype-with-liquid-clustering/m-p/55764#M30428</guid>
      <dc:creator>hukel</dc:creator>
      <dc:date>2023-12-26T22:55:57Z</dc:date>
    </item>
    <item>
      <title>Re: Unsupported datatype 'TimestampNTZType' with liquid clustering</title>
      <link>https://community.databricks.com/t5/data-engineering/unsupported-datatype-timestampntztype-with-liquid-clustering/m-p/55804#M30437</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;just educated guess:&lt;/P&gt;&lt;P&gt;There is limitation in liquid clustering docs:&amp;nbsp;&lt;STRONG&gt;You can only specify columns with statistics collected for clustering keys&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Perhaps it is related to data types for which you can collect statistics?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;But i could not find related docs either &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 27 Dec 2023 13:42:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unsupported-datatype-timestampntztype-with-liquid-clustering/m-p/55804#M30437</guid>
      <dc:creator>Wojciech_BUK</dc:creator>
      <dc:date>2023-12-27T13:42:57Z</dc:date>
    </item>
    <item>
      <title>Re: Unsupported datatype 'TimestampNTZType' with liquid clustering</title>
      <link>https://community.databricks.com/t5/data-engineering/unsupported-datatype-timestampntztype-with-liquid-clustering/m-p/55811#M30438</link>
      <description>&lt;P&gt;Yes,&amp;nbsp; I think you are correct.&amp;nbsp;&amp;nbsp; When I run this,&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;describe extended &amp;lt;table&amp;gt; timestamp&lt;/LI-CODE&gt;&lt;P&gt;I can see that no statistics are being calculated.&amp;nbsp;&amp;nbsp;&amp;nbsp; Now I will have to ask a rookie question,&amp;nbsp; is there a list of data types that do not have statistics tracked?&lt;/P&gt;&lt;P&gt;col_name timestamp&lt;BR /&gt;data_type timestamp_ntz&lt;BR /&gt;comment NULL&lt;BR /&gt;min NULL&lt;BR /&gt;max NULL&lt;BR /&gt;num_nulls NULL&lt;BR /&gt;distinct_count NULL&lt;BR /&gt;avg_col_len NULL&lt;BR /&gt;max_col_len NULL&lt;BR /&gt;histogram NULL&lt;/P&gt;</description>
      <pubDate>Wed, 27 Dec 2023 14:16:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unsupported-datatype-timestampntztype-with-liquid-clustering/m-p/55811#M30438</guid>
      <dc:creator>hukel</dc:creator>
      <dc:date>2023-12-27T14:16:27Z</dc:date>
    </item>
    <item>
      <title>Re: Unsupported datatype 'TimestampNTZType' with liquid clustering</title>
      <link>https://community.databricks.com/t5/data-engineering/unsupported-datatype-timestampntztype-with-liquid-clustering/m-p/55817#M30439</link>
      <description>&lt;P&gt;Running this fills up the statistics for the columns.&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;ANALYZE TABLE &amp;lt;TABLE&amp;gt; COMPUTE STATISTICS FOR COLUMNS timestamp,aid,ContextProcessId&lt;/LI-CODE&gt;&lt;P&gt;But I still get the error when I run OPTIMIZE:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Unsupported datatype 'TimestampNTZType'
com.databricks.backend.common.rpc.SparkDriverExceptions$SQLExecutionException: java.lang.IllegalArgumentException: Unsupported datatype 'TimestampNTZType'
	at com.databricks.sql.io.skipping.Classifier$.getStatsForCol(Classifier.scala:180)
	at com.databricks.sql.io.skipping.SimpleClassifier.applyActionToIntersectingBinaryNode(Classifier.scala:424)
	at com.databricks.sql.io.skipping.SimpleClassifier.applyActionToIntersectingNode(Classifier.scala:405)
	at com.databricks.sql.io.skipping.SimpleClassifier.$anonfun$classifyForOptimize$3(Classifier.scala:268)
	at com.databricks.sql.io.skipping.SimpleClassifier.$anonfun$classifyForOptimize$3$adapted(Classifier.scala:266)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 27 Dec 2023 15:19:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unsupported-datatype-timestampntztype-with-liquid-clustering/m-p/55817#M30439</guid>
      <dc:creator>hukel</dc:creator>
      <dc:date>2023-12-27T15:19:33Z</dc:date>
    </item>
    <item>
      <title>Re: Unsupported datatype 'TimestampNTZType' with liquid clustering</title>
      <link>https://community.databricks.com/t5/data-engineering/unsupported-datatype-timestampntztype-with-liquid-clustering/m-p/55818#M30440</link>
      <description>&lt;P&gt;Sorry, cant find related docs for that limitation &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 27 Dec 2023 16:01:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unsupported-datatype-timestampntztype-with-liquid-clustering/m-p/55818#M30440</guid>
      <dc:creator>Wojciech_BUK</dc:creator>
      <dc:date>2023-12-27T16:01:26Z</dc:date>
    </item>
    <item>
      <title>Re: Unsupported datatype 'TimestampNTZType' with liquid clustering</title>
      <link>https://community.databricks.com/t5/data-engineering/unsupported-datatype-timestampntztype-with-liquid-clustering/m-p/55824#M30441</link>
      <description>&lt;P&gt;I'm not sure if this is related, but I've hit another challenge with &lt;SPAN&gt;TIMESTAMP_NTZ columns&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;As soon as I calculate the statistics on a TIMESTAMP_NTZ column in a table, I can't use that column in a WHERE clause date range.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;This query&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;-- set the variable in advance to eliminate any cast/conversion issues in WHERE clause
DECLARE OR REPLACE VARIABLE dt_begin TIMESTAMP_NTZ DEFAULT '2023-12-04T16:00';

SELECT    event.timestamp as time
        , event.FirstIP4Record as DestIP        
        , event.DomainName as DNSName
        , dt_begin
FROM dnsrequest event
WHERE 
  timestamp &amp;gt;= dt_begin
--   timestamp IS NOT NULL&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;returns&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;MatchError: TimestampNTZType (of class org.apache.spark.sql.types.TimestampNTZType$)
com.databricks.backend.common.rpc.SparkDriverExceptions$SQLExecutionException: scala.MatchError: TimestampNTZType (of class org.apache.spark.sql.types.TimestampNTZType$)
	at com.databricks.sql.optimizer.statsEstimation.FilterEstimation.evaluateBinary(FilterEstimation.scala:523)
	at com.databricks.sql.optimizer.statsEstimation.FilterEstimation.calculateSingleCondition(FilterEstimation.scala:400)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 27 Dec 2023 16:38:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unsupported-datatype-timestampntztype-with-liquid-clustering/m-p/55824#M30441</guid>
      <dc:creator>hukel</dc:creator>
      <dc:date>2023-12-27T16:38:00Z</dc:date>
    </item>
    <item>
      <title>Re: Unsupported datatype 'TimestampNTZType' with liquid clustering</title>
      <link>https://community.databricks.com/t5/data-engineering/unsupported-datatype-timestampntztype-with-liquid-clustering/m-p/56672#M30619</link>
      <description>&lt;P&gt;Per support,&amp;nbsp; "TimestampNTZ data skipping is not yet supported".&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 08 Jan 2024 16:08:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unsupported-datatype-timestampntztype-with-liquid-clustering/m-p/56672#M30619</guid>
      <dc:creator>hukel</dc:creator>
      <dc:date>2024-01-08T16:08:55Z</dc:date>
    </item>
  </channel>
</rss>

