Delta log statistics - timestamp type not working

pantelis_mare
Contributor III

Hello team!

As per the documentation, I understand that the table statistics can be fetched through the delta log (eg min, max, count) in order to not read the underlying data of a delta table.

This is the case for numerical types, and timestamp is supposed to be supported.

In my test example though, when it comes to the timestamp column, the data are fetched as shown by the query plan, instead of reading the delta_log.

Any thoughts on how to make spark read the max values of the timestamp through the delta log?

Delta table creation:

Seq(
  (1, java.sql.Timestamp.valueOf(java.time.LocalDateTime.now())), 
  (2, java.sql.Timestamp.valueOf(java.time.LocalDateTime.now().minusDays(1))),
  (3, java.sql.Timestamp.valueOf(java.time.LocalDateTime.now().minusDays(2)))
  )
.toDF("value", "ts")
.write
.saveAsTable("test.test_ts")

As expected, max for column value is read by the log:

max valuebut this is not the case for the ts column:

image.pngwhile the delta log contains the needed information (max of ts column):

{"protocol":{"minReaderVersion":1,"minWriterVersion":2}}
{"metaData":{"id":"9f6cbb6f-e866-4279-89a3-609bacf175cb","format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"value\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}},{\"name\":\"ts\",\"type\":\"timestamp\",\"nullable\":true,\"metadata\":{}}]}","partitionColumns":[],"configuration":{},"createdTime":1658749213255}}
{"add":{"path":"part-00000-0848b536-6c7b-4429-b168-44d53ec48ba4-c000.snappy.parquet","partitionValues":{},"size":852,"modificationTime":1658749217000,"dataChange":true,"stats":"{\"numRecords\":1,\"minValues\":{\"value\":1,\"ts\":\"2022-07-25T11:40:12.884Z\"},\"maxValues\":{\"value\":1,\"ts\":\"2022-07-25T11:40:12.884Z\"},\"nullCount\":{\"value\":0,\"ts\":0}}","tags":{"INSERTION_TIME":"1658749217000000","OPTIMIZE_TARGET_SIZE":"268435456"}}}
{"add":{"path":"part-00001-0c883bda-bfc3-4921-a46f-32eaa9fb7dbc-c000.snappy.parquet","partitionValues":{},"size":852,"modificationTime":1658749217000,"dataChange":true,"stats":"{\"numRecords\":1,\"minValues\":{\"value\":2,\"ts\":\"2022-07-24T11:40:12.885Z\"},\"maxValues\":{\"value\":2,\"ts\":\"2022-07-24T11:40:12.885Z\"},\"nullCount\":{\"value\":0,\"ts\":0}}","tags":{"INSERTION_TIME":"1658749217000001","OPTIMIZE_TARGET_SIZE":"268435456"}}}
{"add":{"path":"part-00002-54975e9b-3df8-4382-afa0-2929778ecb0a-c000.snappy.parquet","partitionValues":{},"size":853,"modificationTime":1658749217000,"dataChange":true,"stats":"{\"numRecords\":1,\"minValues\":{\"value\":3,\"ts\":\"2022-07-23T11:40:12.885Z\"},\"maxValues\":{\"value\":3,\"ts\":\"2022-07-23T11:40:12.885Z\"},\"nullCount\":{\"value\":0,\"ts\":0}}","tags":{"INSERTION_TIME":"1658749217000002","OPTIMIZE_TARGET_SIZE":"268435456"}}}
{"commitInfo":{"timestamp":1658749218051,"userId":"6999390541537531","userName":"p.maroudis@kaizengaming.com","operation":"CREATE TABLE AS SELECT","operationParameters":{"isManaged":"true","description":null,"partitionBy":"[]","properties":"{}"},"notebook":{"notebookId":"2851472110292823"},"clusterId":"0920-080342-who267","isolationLevel":"WriteSerializable","isBlindAppend":true,"operationMetrics":{"numFiles":"3","numOutputRows":"3","numOutputBytes":"2557"},"engineInfo":"Databricks-Runtime/10.4.x-scala2.12","txnId":"775ac353-9df2-45a2-99d2-1a15783c0e40"}}