<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Why there are many offsets in checkpoint in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/why-there-are-many-offsets-in-checkpoint/m-p/93772#M38765</link>
    <description>&lt;P&gt;Hi team,&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm using trigger=availableNow to read delta table daily. The delta table itself is loaded by structured streaming from kinesis. I noticed there are many offsets under checkpoint, and when the job starting to run to get data from delta table, from log I can see&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;BatchIds found from listing: 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;and these batchIds match the offsets. Is it supposed to read the last offset from checkpoint to read from the delta table or am I misunderstanding something here?&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
    <pubDate>Mon, 14 Oct 2024 07:19:46 GMT</pubDate>
    <dc:creator>MikeGo</dc:creator>
    <dc:date>2024-10-14T07:19:46Z</dc:date>
    <item>
      <title>Why there are many offsets in checkpoint</title>
      <link>https://community.databricks.com/t5/data-engineering/why-there-are-many-offsets-in-checkpoint/m-p/93772#M38765</link>
      <description>&lt;P&gt;Hi team,&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm using trigger=availableNow to read delta table daily. The delta table itself is loaded by structured streaming from kinesis. I noticed there are many offsets under checkpoint, and when the job starting to run to get data from delta table, from log I can see&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;BatchIds found from listing: 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;and these batchIds match the offsets. Is it supposed to read the last offset from checkpoint to read from the delta table or am I misunderstanding something here?&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Mon, 14 Oct 2024 07:19:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/why-there-are-many-offsets-in-checkpoint/m-p/93772#M38765</guid>
      <dc:creator>MikeGo</dc:creator>
      <dc:date>2024-10-14T07:19:46Z</dc:date>
    </item>
    <item>
      <title>Re: Why there are many offsets in checkpoint</title>
      <link>https://community.databricks.com/t5/data-engineering/why-there-are-many-offsets-in-checkpoint/m-p/93838#M38772</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/100643"&gt;@MikeGo&lt;/a&gt;&amp;nbsp; When you see the batch IDs listed in the logs (e.g., 186, 187, 188,...), these correspond to the batches of data that have been processed. Each batch ID represents a specific point in time in the streaming process, where the data was ingested, transformed, and written to the Delta table.&lt;/P&gt;&lt;P&gt;The offsets you see in your checkpoint correspond to the state of the streaming job at each batch ID. The checkpointing mechanism ensures that your streaming job can resume from the last successful batch in case of a failure.&lt;/P&gt;</description>
      <pubDate>Mon, 14 Oct 2024 10:01:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/why-there-are-many-offsets-in-checkpoint/m-p/93838#M38772</guid>
      <dc:creator>Rishabh-Pandey</dc:creator>
      <dc:date>2024-10-14T10:01:31Z</dc:date>
    </item>
  </channel>
</rss>

