06-22-2021 04:45 AM
06-22-2021 04:45 AM
Delta Lake writes checkpoints as an aggregate state of a Delta table every 10 commits. These checkpoints serve as the starting point to compute the latest state of the table. Without checkpoints, Delta Lake would have to read a large collection of JSON files (“delta” files) representing commits to the transaction log to compute the state of a table. In addition, the column-level statistics Delta Lake uses to perform data skipping are stored in the checkpoint.
06-22-2021 04:45 AM
Delta Lake writes checkpoints as an aggregate state of a Delta table every 10 commits. These checkpoints serve as the starting point to compute the latest state of the table. Without checkpoints, Delta Lake would have to read a large collection of JSON files (“delta” files) representing commits to the transaction log to compute the state of a table. In addition, the column-level statistics Delta Lake uses to perform data skipping are stored in the checkpoint.
06-22-2021 04:46 AM
In Databricks Runtime 7.2 and below, column-level statistics are stored in Delta Lake checkpoints as a JSON column.
In Databricks Runtime 7.3 LTS and above, column-level statistics are stored as a struct. The struct format makes Delta Lake reads much faster, because:
The struct format enables a collection of optimizations that reduce the overhead of Delta Lake read operations from seconds to tens of milliseconds, which significantly reduces the latency for short queries.
06-22-2021 09:14 PM
Great points above on how checkpointing helps with performance. In additional Delta Lake also provides other data organization strategies such as compaction, Z-ordering to help with both read and write performance of Delta Tables. Additional details here - https://docs.databricks.com/delta/optimizations/file-mgmt.html
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.