Don't want checkpoint in delta
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-22-2021 04:53 AM
Suppose I am not interested in checkpoints, how can I disable Checkpoints write in delta
- Labels:
-
Checkpoint
-
Delta
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-22-2021 03:57 PM
Checkpoint creation in Delta is not user-controllable features/options. Although it's possible to delay the checkpoint file creation, this could have an impact on the performance of the Delta table. By default a checkpoint file creation is triggered for every 10 commits happening on the Delta table.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-22-2021 05:13 PM
Writing statistics in a checkpoint has a cost which is visible usually only for very large tables. However it is worth mentioning that, this statistics would be very useful for data skipping which speeds up subsequent operations.
In Databricks Runtime 7.2 and below, column-level statistics are stored in Delta Lake checkpoints as a JSON column. In Databricks Runtime 7.3 LTS and above, column-level statistics are stored as a struct (struct format makes Delta Lake reads much faster)
There are two flags that control column-level statistics in checkpoints
delta.checkpoint.writeStatsAsJson & delta.checkpoint.writeStatsAsStruct If both table properties are false, no statistics are collected or written - and readers won't be able to perform data skipping.
For more details on tradeoffs with statistics and checkpoints, see here

