Z-Ordering Timestamp Column

numersoz
New Contributor III

Hi,

I've large Delta Table for IoT data for over 10K different sensors with timestamp, sensor name and value columns at 1 second precision.

Query pattern is usually random 5-100 sensors at a time. But typically involves specific year/month/day interval.

This table grows with Auto Loader every 4 hours, via appending pre-landing parquet files to Delta Table.

I'm planning to create year, year-month, year-month-day columns and Z-Order by these to improve query time.

My question is, when SQL query is based on using the timestamp column, will it take advantage of Z-Ordering? Or do I have to specify the calculated columns (year, year-month, year-month-day) with a where clause?

Geeta1
Databricks Partner

Hi @Nurettin Ersoz​ , you have to specify the column you are using for Z-Ordering in the SQL query.

It seems like you want to specify multiple columns for ZORDER BY as a comma-separated list. But the effectiveness of the locality drops with each additional column. So, the performance may not be better.

https://www.youtube.com/watch?v=A1aR1A8OwOU

Above video might help you. Have a look at it.

View solution in original post

numersoz
New Contributor III

Hi,

Thank you this helps!

Oliver_Angelil
Valued Contributor II

@numersoz did you z-order on the timestamp column or on less granular columns, like Year, Month, or Day. timestamp column is very granular (high cardinality) since it also includes hour, minute, second...