<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic WAL for structured streaming in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/wal-for-structured-streaming/m-p/63727#M32334</link>
    <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt;&lt;P&gt;I cannot find deep-dive on this from latest links. So far the understanding is:&lt;/P&gt;&lt;P&gt;Previously SS (structured streaming) copies and caches the data in WAL. After a version, with retrieve less, SS doesn't copy the data to WAL any more, and only stores "offset", and WAL is not being used any more and only depends on checkpoint. Is this understanding right?&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 14 Mar 2024 18:17:43 GMT</pubDate>
    <dc:creator>MikeGo</dc:creator>
    <dc:date>2024-03-14T18:17:43Z</dc:date>
    <item>
      <title>WAL for structured streaming</title>
      <link>https://community.databricks.com/t5/data-engineering/wal-for-structured-streaming/m-p/63727#M32334</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt;&lt;P&gt;I cannot find deep-dive on this from latest links. So far the understanding is:&lt;/P&gt;&lt;P&gt;Previously SS (structured streaming) copies and caches the data in WAL. After a version, with retrieve less, SS doesn't copy the data to WAL any more, and only stores "offset", and WAL is not being used any more and only depends on checkpoint. Is this understanding right?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 14 Mar 2024 18:17:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/wal-for-structured-streaming/m-p/63727#M32334</guid>
      <dc:creator>MikeGo</dc:creator>
      <dc:date>2024-03-14T18:17:43Z</dc:date>
    </item>
    <item>
      <title>Re: WAL for structured streaming</title>
      <link>https://community.databricks.com/t5/data-engineering/wal-for-structured-streaming/m-p/63840#M32378</link>
      <description>&lt;P&gt;Thanks Kaniz.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Theoretically even if without WAL, everything can be recovered from checkpoint right? Does the WAL exist only for perf reasons? E.g. for a micro batch, Spark might run multiple batches inside the microbatch and WAL is used to record the state of each micro micro-batch?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 15 Mar 2024 17:05:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/wal-for-structured-streaming/m-p/63840#M32378</guid>
      <dc:creator>MikeGo</dc:creator>
      <dc:date>2024-03-15T17:05:23Z</dc:date>
    </item>
  </channel>
</rss>

