<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Obtain the source table version number from checkpoint file when using Structured Streaming in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/obtain-the-source-table-version-number-from-checkpoint-file-when/m-p/89067#M37678</link>
    <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;: it looks like you were helping but when I read through your answer, it seems that you just repeated the information shared by Argus1. It would be much better if you just acknowledge that you have no idea about this question.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sun, 08 Sep 2024 09:13:49 GMT</pubDate>
    <dc:creator>erssiws</dc:creator>
    <dc:date>2024-09-08T09:13:49Z</dc:date>
    <item>
      <title>Obtain the source table version number from checkpoint file when using Structured Streaming</title>
      <link>https://community.databricks.com/t5/data-engineering/obtain-the-source-table-version-number-from-checkpoint-file-when/m-p/50681#M28862</link>
      <description>&lt;P&gt;Hello!&lt;/P&gt;&lt;P&gt;I'm using Structured Streaming to write to a delta table. The source is another delta table written with Structured Streaming as well. In order to datacheck the results I'm attempting to obtain from the checkpoint files of the target table the &lt;STRONG&gt;version number of the source table used to process each run.&amp;nbsp;&lt;BR /&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;When inspecting the checkpoint files I recognize two possible patterns:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="javascript"&gt;{"sourceVersion":1,"reservoirId":"4121e6a2-ab1a-4f6c-8217-6412909486c0","reservoirVersion":3716,"index":5285,"isStartingVersion":true}&lt;/LI-CODE&gt;&lt;LI-CODE lang="javascript"&gt;{"sourceVersion":1,"reservoirId":"4121e6a2-ab1a-4f6c-8217-6412909486c0","reservoirVersion":3719,"index":-1,"isStartingVersion":false}&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;From the cases I've seen so far, it seems like the `reservoirVersion` value refers to the version of the source table. But this value should be adjusted by 1 when `index` = -1, and kept as is when `index` is a positive number.&lt;/P&gt;&lt;P&gt;In these examples:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;The first one read version `3716` of the source table&lt;/LI&gt;&lt;LI&gt;The second one read version `3718`&amp;nbsp;of the source table (adjusted from&amp;nbsp;`reservoirVersion` because `index` = -1)&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Also it seems like `index` is always -1 except for the first checkpoint file of a stream (which contains the value `isStartingVersion` = true as well).&lt;/P&gt;&lt;P&gt;I was able to verify these assumptions for every file I've checked, particularly noticing that for cases where `index` was -1 the value of `&lt;SPAN&gt;reservoirVersion` was always 1 unit above the last&amp;nbsp;available version of the source table.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;I couldn't find any documentation backing up this logic.&lt;BR /&gt;Could you help me confirm if this reasoning is correct and it will continue to be work like this for all future runs?&lt;BR /&gt;If not, could another pattern appear in these files?&lt;BR /&gt;Is there any documentation explaining the meaning of each of these fields?&lt;/P&gt;&lt;P&gt;Thank you for your help!&lt;/P&gt;</description>
      <pubDate>Wed, 08 Nov 2023 18:09:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/obtain-the-source-table-version-number-from-checkpoint-file-when/m-p/50681#M28862</guid>
      <dc:creator>Agus1</dc:creator>
      <dc:date>2023-11-08T18:09:14Z</dc:date>
    </item>
    <item>
      <title>Re: Obtain the source table version number from checkpoint file when using Structured Streaming</title>
      <link>https://community.databricks.com/t5/data-engineering/obtain-the-source-table-version-number-from-checkpoint-file-when/m-p/54109#M29990</link>
      <description>&lt;P&gt;Hello &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;, thank you for your answer.&lt;/P&gt;&lt;P&gt;I'm a bit confused here because you seem to be describing the opposite behavior of what I've seen in our checkpoint files.&lt;/P&gt;&lt;P&gt;Here I repost my examples to try to understand better.&lt;/P&gt;&lt;P&gt;First checkpoint file:&lt;/P&gt;&lt;PRE&gt;{"sourceVersion":1,"reservoirId":"4121e6a2-ab1a-4f6c-8217-6412909486c0","reservoirVersion":3716,"index":5285,"isStartingVersion":true}&lt;/PRE&gt;&lt;P&gt;All following checkpoint files:&lt;/P&gt;&lt;PRE&gt;{"sourceVersion":1,"reservoirId":"4121e6a2-ab1a-4f6c-8217-6412909486c0","reservoirVersion":3719,"index":-1,"isStartingVersion":false}&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;When you mention &lt;EM&gt;"when index is&amp;nbsp;&lt;STRONG&gt;-1&lt;/STRONG&gt;, it signifies the first checkpoint file of a stream. In this scenario, the reservoirVersion should be adjusted by adding 1 to it.":&lt;/EM&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;as I describe in my original posting, when I see index = -1 I need to&amp;nbsp;&lt;STRONG&gt;subtract&lt;/STRONG&gt; 1 from the&amp;nbsp;reservoirVersion &lt;STRONG&gt;not add&amp;nbsp;&lt;/STRONG&gt;1, as the version present in the file doesn't even exist yet (it's 1 version above the last available version of the table). In the examples I show this is the second one, where&amp;nbsp;reservoirVersion is &lt;STRONG&gt;3719&lt;/STRONG&gt;, where I need to subtract 1 as this version doesn't exist.&lt;/LI&gt;&lt;LI&gt;You say "&lt;EM&gt;it signifies the first checkpoint file of a stream"&lt;/EM&gt;, but from all the files I've seen in my implementations, &lt;STRONG&gt;never&lt;/STRONG&gt; the index is -1 for the first file. Actually, in the first file is the only time I don't see index = -1.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;From your comment regarding&amp;nbsp;&lt;STRONG&gt;isStartingVersion&lt;/STRONG&gt;: &lt;EM&gt;"This field is present only in the first checkpoint file of a stream (where index = -1)":&lt;/EM&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;As you can see in my 1st example marked as "First checkpoint file",&amp;nbsp;&lt;STRONG&gt;isStartingVersion&amp;nbsp;&lt;/STRONG&gt;is&amp;nbsp;&lt;STRONG&gt;true&amp;nbsp;&lt;/STRONG&gt;and index &lt;STRONG&gt;is not -1&lt;/STRONG&gt;. Also&amp;nbsp;&lt;EM&gt;reservoirVersion &lt;/EM&gt;points to the correct version of the table (no need to subtract 1).&amp;nbsp;As I mentioned before, this is the behavior I've seen for all first checkpoint files. (First shared example)&lt;/LI&gt;&lt;LI&gt;And for all following files,&amp;nbsp;&lt;STRONG&gt;isStartingVersion&amp;nbsp;&lt;/STRONG&gt;is always&amp;nbsp;&lt;STRONG&gt;false,&amp;nbsp;index&amp;nbsp;&lt;/STRONG&gt;is always&amp;nbsp;&lt;STRONG&gt;-1&lt;/STRONG&gt;, and the&amp;nbsp;&lt;STRONG&gt;&lt;EM&gt;reservoirVersion&amp;nbsp;&lt;/EM&gt;&lt;/STRONG&gt;always needs to be adjusted by &lt;STRONG&gt;subtracting 1&lt;/STRONG&gt;. (Second shared example)&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Can you help me understand if this behavior is normal or why I might be experiencing something different to what you mention?&lt;/P&gt;&lt;P&gt;Thank you very much for your help.&lt;/P&gt;</description>
      <pubDate>Tue, 28 Nov 2023 15:01:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/obtain-the-source-table-version-number-from-checkpoint-file-when/m-p/54109#M29990</guid>
      <dc:creator>Agus1</dc:creator>
      <dc:date>2023-11-28T15:01:31Z</dc:date>
    </item>
    <item>
      <title>Re: Obtain the source table version number from checkpoint file when using Structured Streaming</title>
      <link>https://community.databricks.com/t5/data-engineering/obtain-the-source-table-version-number-from-checkpoint-file-when/m-p/89067#M37678</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;: it looks like you were helping but when I read through your answer, it seems that you just repeated the information shared by Argus1. It would be much better if you just acknowledge that you have no idea about this question.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 08 Sep 2024 09:13:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/obtain-the-source-table-version-number-from-checkpoint-file-when/m-p/89067#M37678</guid>
      <dc:creator>erssiws</dc:creator>
      <dc:date>2024-09-08T09:13:49Z</dc:date>
    </item>
  </channel>
</rss>

