My observations show that timestamp difference has type of INTERVAL DAY TO SECONDS:select typeof(getdate() - current_date())
-----------------------------------------
interval day to secondBut is it guaranteed? Can it be DAY TO MINUTE or, say, YEAR T...
I tried contact details on the bottom, but they seem to be generic Databricks contact and support links. The issue I faced was this:I think this word made its way to the stop list by a mistake.
I have a table, full scan of which takes ~20 minutes on my cluster. The table has "Time" TIMESTAMP column and "day" DATE column. The latter is computed (manually) as "Time" truncated to day and used for partitioning.I query the table using predicate ...
I am looking on EXPLAIN EXTENDED plan for a statement.In == Physical Plan == section, I go down to FileScan node and see a lot of ellipsis, like +- FileScan parquet schema.table[Time#8459,TagName#8460,Value#8461,Quality#8462,day#8...
I think I found it: https://stackoverflow.com/a/57891876/947012This can explain the performance as, thanks to partitioning, most files can be skipped based on parquet metadata. Partitioning is not used as a feature, but contributes into organization ...
To develop more on point #1, even on a simple query like "select count(Time) from mytable where Time between '2018-11-27' and '2018-11-30'" with a short Time interval, the number of read files is consistently small and number of pruned files is consi...
Please don't be sorry, I appreciate your willingness to help!Yes, the performance I have is in the "cold" state, no full scans prior that (and actually it does not have big impact in my case, filtered scan after full scan is not faster than cold one,...