Hello!
I'm using Structured Streaming to write to a delta table. The source is another delta table written with Structured Streaming as well. In order to datacheck the results I'm attempting to obtain from the checkpoint files of the target table the version number of the source table used to process each run.
When inspecting the checkpoint files I recognize two possible patterns:
{"sourceVersion":1,"reservoirId":"4121e6a2-ab1a-4f6c-8217-6412909486c0","reservoirVersion":3716,"index":5285,"isStartingVersion":true}
{"sourceVersion":1,"reservoirId":"4121e6a2-ab1a-4f6c-8217-6412909486c0","reservoirVersion":3719,"index":-1,"isStartingVersion":false}
From the cases I've seen so far, it seems like the `reservoirVersion` value refers to the version of the source table. But this value should be adjusted by 1 when `index` = -1, and kept as is when `index` is a positive number.
In these examples:
- The first one read version `3716` of the source table
- The second one read version `3718` of the source table (adjusted from `reservoirVersion` because `index` = -1)
Also it seems like `index` is always -1 except for the first checkpoint file of a stream (which contains the value `isStartingVersion` = true as well).
I was able to verify these assumptions for every file I've checked, particularly noticing that for cases where `index` was -1 the value of `reservoirVersion` was always 1 unit above the last available version of the source table.
I couldn't find any documentation backing up this logic.
Could you help me confirm if this reasoning is correct and it will continue to be work like this for all future runs?
If not, could another pattern appear in these files?
Is there any documentation explaining the meaning of each of these fields?
Thank you for your help!