cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Don't want checkpoint in delta

User16826994223
Honored Contributor III

Suppose I am not interested in checkpoints, how can I disable Checkpoints write in delta

2 REPLIES 2

brickster_2018
Databricks Employee
Databricks Employee

Checkpoint creation in Delta is not user-controllable features/options. Although it's possible to delay the checkpoint file creation, this could have an impact on the performance of the Delta table. By default a checkpoint file creation is triggered for every 10 commits happening on the Delta table.

sajith_appukutt
Honored Contributor II

Writing statistics in a checkpoint has a cost which is visible usually only for very large tables. However it is worth mentioning that, this statistics would be very useful for data skipping which speeds up subsequent operations.

In Databricks Runtime 7.2 and below, column-level statistics are stored in Delta Lake checkpoints as a JSON column. In Databricks Runtime 7.3 LTS and above, column-level statistics are stored as a struct (struct format makes Delta Lake reads much faster)

There are two flags that control column-level statistics in checkpoints

delta.checkpoint.writeStatsAsJson & delta.checkpoint.writeStatsAsStruct If both table properties are  false, no statistics are collected or written - and readers won't be able to perform data skipping.

For more details on tradeoffs with statistics and checkpoints, see here

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now