<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic why delta log checkpoint is created in different formats in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/why-delta-log-checkpoint-is-created-in-different-formats/m-p/95391#M39091</link>
    <description>&lt;P&gt;Hi,&lt;BR /&gt;I'm using runtime&amp;nbsp;&lt;SPAN&gt;15.4 LTS or 14.3 LTS. When loading a delta lake table from Kinesis, I found the delta log checkpoint is in mixing formats like:&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;7616 00000000000003291896.checkpoint.b1c24725-....json
7616 00000000000003291906.checkpoint.873e1b3e-....json
7616 00000000000003291916.checkpoint.e14e7613-....json
7616 00000000000003291926.checkpoint.3c9a0512-....json
7616 00000000000003291936.checkpoint.ba87e77a-....json
7653 00000000000003291936.checkpoint.parquet
7616 00000000000003291946.checkpoint.daf933a4-....json
7616 00000000000003291956.checkpoint.80768fb1-....json
7614 00000000000003291961.checkpoint.59ad2faf-....json
7614 00000000000003291971.checkpoint.ddb7a4f4-....json
7614 00000000000003291981.checkpoint.45867b1a-....json
7614 00000000000003291991.checkpoint.ec13fc70-....json&lt;/LI-CODE&gt;&lt;P&gt;why it has mixed classic and v2 checkpoints together?&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 21 Oct 2024 22:55:53 GMT</pubDate>
    <dc:creator>MikeGo</dc:creator>
    <dc:date>2024-10-21T22:55:53Z</dc:date>
    <item>
      <title>why delta log checkpoint is created in different formats</title>
      <link>https://community.databricks.com/t5/data-engineering/why-delta-log-checkpoint-is-created-in-different-formats/m-p/95391#M39091</link>
      <description>&lt;P&gt;Hi,&lt;BR /&gt;I'm using runtime&amp;nbsp;&lt;SPAN&gt;15.4 LTS or 14.3 LTS. When loading a delta lake table from Kinesis, I found the delta log checkpoint is in mixing formats like:&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;7616 00000000000003291896.checkpoint.b1c24725-....json
7616 00000000000003291906.checkpoint.873e1b3e-....json
7616 00000000000003291916.checkpoint.e14e7613-....json
7616 00000000000003291926.checkpoint.3c9a0512-....json
7616 00000000000003291936.checkpoint.ba87e77a-....json
7653 00000000000003291936.checkpoint.parquet
7616 00000000000003291946.checkpoint.daf933a4-....json
7616 00000000000003291956.checkpoint.80768fb1-....json
7614 00000000000003291961.checkpoint.59ad2faf-....json
7614 00000000000003291971.checkpoint.ddb7a4f4-....json
7614 00000000000003291981.checkpoint.45867b1a-....json
7614 00000000000003291991.checkpoint.ec13fc70-....json&lt;/LI-CODE&gt;&lt;P&gt;why it has mixed classic and v2 checkpoints together?&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 21 Oct 2024 22:55:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/why-delta-log-checkpoint-is-created-in-different-formats/m-p/95391#M39091</guid>
      <dc:creator>MikeGo</dc:creator>
      <dc:date>2024-10-21T22:55:53Z</dc:date>
    </item>
    <item>
      <title>Re: why delta log checkpoint is created in different formats</title>
      <link>https://community.databricks.com/t5/data-engineering/why-delta-log-checkpoint-is-created-in-different-formats/m-p/95450#M39104</link>
      <description>&lt;DIV&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/100643"&gt;@MikeGo&lt;/a&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Your Delta table logs indicate a transition between classic checkpoint files (JSON) and the new v2 checkpoint format (Parquet). This behavior is managed by Delta Lake features and versioning. With Delta Lake 2.3, v2 checkpoints in Parquet were introduced to improve performance by enabling faster reading and writing compared to JSON checkpoints.&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;To consistently use Parquet checkpoints, set the following configurations:&lt;/DIV&gt;&lt;DIV&gt;&lt;EM&gt;&lt;STRONG&gt;spark.conf.set("spark.databricks.delta.checkpoint.writeStatsAsStruct", "true")&lt;/STRONG&gt;&lt;/EM&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;EM&gt;&lt;STRONG&gt;spark.conf.set("spark.databricks.delta.checkpoint.writeFormat", "parquet")&lt;/STRONG&gt;&lt;/EM&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;For compatibility with older jobs still relying on JSON checkpoints, enforce the classic format using:&lt;/DIV&gt;&lt;DIV&gt;&lt;EM&gt;&lt;STRONG&gt;spark.conf.set("spark.databricks.delta.checkpoint.writeFormat", "json")&lt;/STRONG&gt;&lt;/EM&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Cleanup:&lt;/DIV&gt;&lt;DIV&gt;To maintain consistent checkpoints, manually trigger Delta log compaction by running the&amp;nbsp;VACUUM &amp;amp; OPTIMIZE&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Recommendation:&lt;/DIV&gt;&lt;DIV&gt;If you are using Databricks Runtime 15.4 LTS or 14.3 LTS, I recommend fully switching to v2 Parquet checkpoints to benefit from faster log processing.&lt;/DIV&gt;</description>
      <pubDate>Tue, 22 Oct 2024 10:28:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/why-delta-log-checkpoint-is-created-in-different-formats/m-p/95450#M39104</guid>
      <dc:creator>Panda</dc:creator>
      <dc:date>2024-10-22T10:28:58Z</dc:date>
    </item>
    <item>
      <title>Re: why delta log checkpoint is created in different formats</title>
      <link>https://community.databricks.com/t5/data-engineering/why-delta-log-checkpoint-is-created-in-different-formats/m-p/96043#M39201</link>
      <description>&lt;P&gt;Thanks. We use a job to load data from Kinesis to delta table. I added the&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;spark.databricks.delta.checkpoint.writeFormat parquet
spark.databricks.delta.checkpoint.writeStatsAsStruct true&lt;/LI-CODE&gt;&lt;P&gt;in job cluster, but the checkpoints still show different formats. The table properties has set:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;delta.checkpointPolicy	v2&lt;/LI-CODE&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 24 Oct 2024 18:39:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/why-delta-log-checkpoint-is-created-in-different-formats/m-p/96043#M39201</guid>
      <dc:creator>MikeGo</dc:creator>
      <dc:date>2024-10-24T18:39:13Z</dc:date>
    </item>
  </channel>
</rss>

