In Databricks Runtime 7.2 and below, column-level statistics are stored in Delta Lake checkpoints as a JSON column.
In Databricks Runtime 7.3 LTS and above, column-level statistics are stored as a struct. The struct format makes Delta Lake reads much faster, because:
- Delta Lake doesn’t perform expensive JSON parsing to obtain column-level statistics.
- Parquet column pruning capabilities significantly reduce the I/O required to read the statistics for a column.
The struct format enables a collection of optimizations that reduce the overhead of Delta Lake read operations from seconds to tens of milliseconds, which significantly reduces the latency for short queries.