cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

why delta log checkpoint is created in different formats

Brad
Contributor II

Hi,
I'm using runtime 15.4 LTS or 14.3 LTS. When loading a delta lake table from Kinesis, I found the delta log checkpoint is in mixing formats like:

7616 00000000000003291896.checkpoint.b1c24725-....json
7616 00000000000003291906.checkpoint.873e1b3e-....json
7616 00000000000003291916.checkpoint.e14e7613-....json
7616 00000000000003291926.checkpoint.3c9a0512-....json
7616 00000000000003291936.checkpoint.ba87e77a-....json
7653 00000000000003291936.checkpoint.parquet
7616 00000000000003291946.checkpoint.daf933a4-....json
7616 00000000000003291956.checkpoint.80768fb1-....json
7614 00000000000003291961.checkpoint.59ad2faf-....json
7614 00000000000003291971.checkpoint.ddb7a4f4-....json
7614 00000000000003291981.checkpoint.45867b1a-....json
7614 00000000000003291991.checkpoint.ec13fc70-....json

why it has mixed classic and v2 checkpoints together?

Thanks

 

2 REPLIES 2

Panda
Valued Contributor
 
Your Delta table logs indicate a transition between classic checkpoint files (JSON) and the new v2 checkpoint format (Parquet). This behavior is managed by Delta Lake features and versioning. With Delta Lake 2.3, v2 checkpoints in Parquet were introduced to improve performance by enabling faster reading and writing compared to JSON checkpoints.
 
To consistently use Parquet checkpoints, set the following configurations:
spark.conf.set("spark.databricks.delta.checkpoint.writeStatsAsStruct", "true")
spark.conf.set("spark.databricks.delta.checkpoint.writeFormat", "parquet")
 
 
For compatibility with older jobs still relying on JSON checkpoints, enforce the classic format using:
spark.conf.set("spark.databricks.delta.checkpoint.writeFormat", "json")
 
Cleanup:
To maintain consistent checkpoints, manually trigger Delta log compaction by running the VACUUM & OPTIMIZE
 
Recommendation:
If you are using Databricks Runtime 15.4 LTS or 14.3 LTS, I recommend fully switching to v2 Parquet checkpoints to benefit from faster log processing.

Brad
Contributor II

Thanks. We use a job to load data from Kinesis to delta table. I added the 

spark.databricks.delta.checkpoint.writeFormat parquet
spark.databricks.delta.checkpoint.writeStatsAsStruct true

in job cluster, but the checkpoints still show different formats. The table properties has set:

delta.checkpointPolicy	v2



 

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group