<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Delta lake Check points storage concept in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/delta-lake-check-points-storage-concept/m-p/21828#M14917</link>
    <description>&lt;P&gt;In which format the Checkpoints are stored in storage and , how does it help in delta to increase performance.&lt;/P&gt;</description>
    <pubDate>Tue, 22 Jun 2021 11:45:08 GMT</pubDate>
    <dc:creator>User16826994223</dc:creator>
    <dc:date>2021-06-22T11:45:08Z</dc:date>
    <item>
      <title>Delta lake Check points storage concept</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-lake-check-points-storage-concept/m-p/21828#M14917</link>
      <description>&lt;P&gt;In which format the Checkpoints are stored in storage and , how does it help in delta to increase performance.&lt;/P&gt;</description>
      <pubDate>Tue, 22 Jun 2021 11:45:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-lake-check-points-storage-concept/m-p/21828#M14917</guid>
      <dc:creator>User16826994223</dc:creator>
      <dc:date>2021-06-22T11:45:08Z</dc:date>
    </item>
    <item>
      <title>Re: Delta lake Check points storage concept</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-lake-check-points-storage-concept/m-p/21829#M14918</link>
      <description>&lt;P&gt;Delta Lake writes&amp;nbsp;&lt;A href="https://github.com/delta-io/delta/blob/master/PROTOCOL.md#checkpoints" alt="https://github.com/delta-io/delta/blob/master/PROTOCOL.md#checkpoints" target="_blank"&gt;checkpoints&lt;/A&gt;&amp;nbsp;as an aggregate state of a Delta table every 10 commits. These checkpoints serve as the starting point to compute the latest state of the table. Without checkpoints, Delta Lake would have to read a large collection of JSON files (“delta” files) representing commits to the transaction log to compute the state of a table. In addition, the column-level statistics Delta Lake uses to perform&amp;nbsp;&lt;A href="https://docs.databricks.com/delta/optimizations/file-mgmt.html#delta-data-skipping" alt="https://docs.databricks.com/delta/optimizations/file-mgmt.html#delta-data-skipping" target="_blank"&gt;data skipping&lt;/A&gt;&amp;nbsp;are stored in the checkpoint.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 22 Jun 2021 11:45:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-lake-check-points-storage-concept/m-p/21829#M14918</guid>
      <dc:creator>User16826994223</dc:creator>
      <dc:date>2021-06-22T11:45:39Z</dc:date>
    </item>
    <item>
      <title>Re: Delta lake Check points storage concept</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-lake-check-points-storage-concept/m-p/21830#M14919</link>
      <description>&lt;P&gt;In Databricks Runtime 7.2 and below, column-level statistics are stored in Delta Lake checkpoints as a JSON column.&lt;/P&gt;&lt;P&gt;In Databricks Runtime 7.3 LTS and above, column-level statistics are stored as a struct. The struct format makes Delta Lake reads much faster, because:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Delta Lake doesn’t perform expensive JSON parsing to obtain column-level statistics.&lt;/LI&gt;&lt;LI&gt;Parquet column pruning capabilities significantly reduce the I/O required to read the statistics for a column.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;The struct format enables a collection of optimizations that reduce the overhead of Delta Lake read operations from seconds to tens of milliseconds, which significantly reduces the latency for short queries.&lt;/P&gt;</description>
      <pubDate>Tue, 22 Jun 2021 11:46:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-lake-check-points-storage-concept/m-p/21830#M14919</guid>
      <dc:creator>User16826994223</dc:creator>
      <dc:date>2021-06-22T11:46:04Z</dc:date>
    </item>
    <item>
      <title>Re: Delta lake Check points storage concept</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-lake-check-points-storage-concept/m-p/21831#M14920</link>
      <description>&lt;P&gt;Great points above on how checkpointing helps with performance. In additional Delta Lake also provides other data organization strategies such as compaction, Z-ordering to help with both read and write performance of Delta Tables. Additional details here - &lt;A href="https://docs.databricks.com/delta/optimizations/file-mgmt.html" target="test_blank"&gt;https://docs.databricks.com/delta/optimizations/file-mgmt.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 23 Jun 2021 04:14:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-lake-check-points-storage-concept/m-p/21831#M14920</guid>
      <dc:creator>aladda</dc:creator>
      <dc:date>2021-06-23T04:14:58Z</dc:date>
    </item>
  </channel>
</rss>

