Databricks Community

User16826994223 · ‎06-22-2021

In which format the Checkpoints are stored in storage and , how does it help in delta to increase performance.

User16826994223 · ‎06-22-2021

Delta Lake writes checkpoints as an aggregate state of a Delta table every 10 commits. These checkpoints serve as the starting point to compute the latest state of the table. Without checkpoints, Delta Lake would have to read a large collection of JSON files (“delta” files) representing commits to the transaction log to compute the state of a table. In addition, the column-level statistics Delta Lake uses to perform data skipping are stored in the checkpoint.

View solution in original post

User16826994223 · ‎06-22-2021

Delta Lake writes checkpoints as an aggregate state of a Delta table every 10 commits. These checkpoints serve as the starting point to compute the latest state of the table. Without checkpoints, Delta Lake would have to read a large collection of JSON files (“delta” files) representing commits to the transaction log to compute the state of a table. In addition, the column-level statistics Delta Lake uses to perform data skipping are stored in the checkpoint.

User16826994223 · ‎06-22-2021

In Databricks Runtime 7.2 and below, column-level statistics are stored in Delta Lake checkpoints as a JSON column.

In Databricks Runtime 7.3 LTS and above, column-level statistics are stored as a struct. The struct format makes Delta Lake reads much faster, because:

Delta Lake doesn’t perform expensive JSON parsing to obtain column-level statistics.
Parquet column pruning capabilities significantly reduce the I/O required to read the statistics for a column.

The struct format enables a collection of optimizations that reduce the overhead of Delta Lake read operations from seconds to tens of milliseconds, which significantly reduces the latency for short queries.

aladda · ‎06-22-2021

Great points above on how checkpointing helps with performance. In additional Delta Lake also provides other data organization strategies such as compaction, Z-ordering to help with both read and write performance of Delta Tables. Additional details here - https://docs.databricks.com/delta/optimizations/file-mgmt.html

Databricks Community

Delta lake Check points storage concept

Join Us as a Local Community Builder!

PSA: Community Edition retires on January 1, 2026. Move to the Free Edition today to keep your work.

🎤 Call for Presentations: Data + AI Summit 2026 is Open!

Last Chance: Help Shape the 2026 Data + AI Summit | Win a Full Conference Pass

🌟 Community Pulse: Your Weekly Roundup! December 05 – 11, 2025

Celebrating Our First Brickster Champion: Louis Frolio