Your Delta table logs indicate a transition between classic checkpoint files (JSON) and the new v2 checkpoint format (Parquet). This behavior is managed by Delta Lake features and versioning. With Delta Lake 2.3, v2 checkpoints in Parquet were introduced to improve performance by enabling faster reading and writing compared to JSON checkpoints.
To consistently use Parquet checkpoints, set the following configurations:
spark.conf.set("spark.databricks.delta.checkpoint.writeStatsAsStruct", "true")
spark.conf.set("spark.databricks.delta.checkpoint.writeFormat", "parquet")
For compatibility with older jobs still relying on JSON checkpoints, enforce the classic format using:
spark.conf.set("spark.databricks.delta.checkpoint.writeFormat", "json")
Cleanup:
To maintain consistent checkpoints, manually trigger Delta log compaction by running the VACUUM & OPTIMIZE
Recommendation:
If you are using Databricks Runtime 15.4 LTS or 14.3 LTS, I recommend fully switching to v2 Parquet checkpoints to benefit from faster log processing.