why delta log checkpoint is created in different formats
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-21-2024 03:55 PM
Hi,
I'm using runtime 15.4 LTS or 14.3 LTS. When loading a delta lake table from Kinesis, I found the delta log checkpoint is in mixing formats like:
7616 00000000000003291896.checkpoint.b1c24725-....json
7616 00000000000003291906.checkpoint.873e1b3e-....json
7616 00000000000003291916.checkpoint.e14e7613-....json
7616 00000000000003291926.checkpoint.3c9a0512-....json
7616 00000000000003291936.checkpoint.ba87e77a-....json
7653 00000000000003291936.checkpoint.parquet
7616 00000000000003291946.checkpoint.daf933a4-....json
7616 00000000000003291956.checkpoint.80768fb1-....json
7614 00000000000003291961.checkpoint.59ad2faf-....json
7614 00000000000003291971.checkpoint.ddb7a4f4-....json
7614 00000000000003291981.checkpoint.45867b1a-....json
7614 00000000000003291991.checkpoint.ec13fc70-....json
why it has mixed classic and v2 checkpoints together?
Thanks
Labels:
- Labels:
-
Delta Lake
-
Spark
2 REPLIES 2
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-22-2024 03:28 AM
Your Delta table logs indicate a transition between classic checkpoint files (JSON) and the new v2 checkpoint format (Parquet). This behavior is managed by Delta Lake features and versioning. With Delta Lake 2.3, v2 checkpoints in Parquet were introduced to improve performance by enabling faster reading and writing compared to JSON checkpoints.
To consistently use Parquet checkpoints, set the following configurations:
spark.conf.set("spark.databricks.delta.checkpoint.writeStatsAsStruct", "true")
spark.conf.set("spark.databricks.delta.checkpoint.writeFormat", "parquet")
For compatibility with older jobs still relying on JSON checkpoints, enforce the classic format using:
spark.conf.set("spark.databricks.delta.checkpoint.writeFormat", "json")
Cleanup:
To maintain consistent checkpoints, manually trigger Delta log compaction by running the VACUUM & OPTIMIZE
Recommendation:
If you are using Databricks Runtime 15.4 LTS or 14.3 LTS, I recommend fully switching to v2 Parquet checkpoints to benefit from faster log processing.
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-24-2024 11:39 AM
Thanks. We use a job to load data from Kinesis to delta table. I added the
spark.databricks.delta.checkpoint.writeFormat parquet
spark.databricks.delta.checkpoint.writeStatsAsStruct true
in job cluster, but the checkpoints still show different formats. The table properties has set:
delta.checkpointPolicy v2

