<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: When delta is a streaming source, how can we get the consumer lag? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/when-delta-is-a-streaming-source-how-can-we-get-the-consumer-lag/m-p/33468#M24457</link>
    <description>&lt;P&gt;Hi @Yerachmiel Feltzman​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You will need to take a look at the micro-batch metrics. This article will explain more what each metric means &lt;A href="https://databricks.com/blog/2020/07/29/a-look-at-the-new-structured-streaming-ui-in-apache-spark-3-0.html" alt="https://databricks.com/blog/2020/07/29/a-look-at-the-new-structured-streaming-ui-in-apache-spark-3-0.html" target="_blank"&gt;https://databricks.com/blog/2020/07/29/a-look-at-the-new-structured-streaming-ui-in-apache-spark-3-0.html&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 26 Jan 2022 00:43:52 GMT</pubDate>
    <dc:creator>jose_gonzalez</dc:creator>
    <dc:date>2022-01-26T00:43:52Z</dc:date>
    <item>
      <title>When delta is a streaming source, how can we get the consumer lag?</title>
      <link>https://community.databricks.com/t5/data-engineering/when-delta-is-a-streaming-source-how-can-we-get-the-consumer-lag/m-p/33459#M24448</link>
      <description>&lt;P&gt;Hi, &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I want to keep track of the streaming lag from the source table, which is a delta table. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I see that in query progress logs, there is some information about the last version and the last file in the version for the end offset, but this don't give the lag from the source table, unless I query it and check what the last version and files count is.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;"sources" : [ {
    "description" : "DeltaSource[dbfs:/mnt/defaultDatalake/zones/bronze/my_source_table]",
    "startOffset" : {
      "sourceVersion" : 1,
      "reservoirId" : "15059b8a-0f48-4561-9424-8fcb0c8906de",
      "reservoirVersion" : 39673,
      "index" : -1,
      "isStartingVersion" : false
    },
    "endOffset" : {
      "sourceVersion" : 1,
      "reservoirId" : "15059b8a-0f48-4561-9424-8fcb0c8906de",
      "reservoirVersion" : 39674,
      "index" : -1,
      "isStartingVersion" : false
    },&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Just to be clear, by lag I mean, that if for example the source table has the last row 100 and the streaming is now processing row 90, my lag would be 10 from the source table.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;One more technical point: how can I parse the startOffset and endOffset. From the `SourceProgress` class I have direct access to the endOffset field, but not to its inners fields (like index). Should I just parse the endOffset string as json using some standard json library like jackson or ujson?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you very much.&lt;/P&gt;</description>
      <pubDate>Thu, 09 Dec 2021 20:45:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/when-delta-is-a-streaming-source-how-can-we-get-the-consumer-lag/m-p/33459#M24448</guid>
      <dc:creator>YFL</dc:creator>
      <dc:date>2021-12-09T20:45:43Z</dc:date>
    </item>
    <item>
      <title>Re: When delta is a streaming source, how can we get the consumer lag?</title>
      <link>https://community.databricks.com/t5/data-engineering/when-delta-is-a-streaming-source-how-can-we-get-the-consumer-lag/m-p/33461#M24450</link>
      <description>&lt;P&gt;Thanks, Kaniz. This is a highly important question for some production jobs we have (and we are highly invested in Databricks and Delta). I have seen others through the internet asking the same question, as well.&lt;/P&gt;&lt;P&gt;Thank you.&lt;/P&gt;&lt;P&gt;Yerachmiel Feltzman | Data Platform Developer&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 10 Dec 2021 08:12:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/when-delta-is-a-streaming-source-how-can-we-get-the-consumer-lag/m-p/33461#M24450</guid>
      <dc:creator>YFL</dc:creator>
      <dc:date>2021-12-10T08:12:16Z</dc:date>
    </item>
    <item>
      <title>Re: When delta is a streaming source, how can we get the consumer lag?</title>
      <link>https://community.databricks.com/t5/data-engineering/when-delta-is-a-streaming-source-how-can-we-get-the-consumer-lag/m-p/33462#M24451</link>
      <description>&lt;P&gt;Hi @Yerachmiel Feltzman​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You can take a look at the following metrics &lt;A href="https://docs.databricks.com/delta/delta-streaming.html#metrics" target="test_blank"&gt;https://docs.databricks.com/delta/delta-streaming.html#metrics&lt;/A&gt; in your stream query progress &lt;/P&gt;</description>
      <pubDate>Fri, 10 Dec 2021 23:05:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/when-delta-is-a-streaming-source-how-can-we-get-the-consumer-lag/m-p/33462#M24451</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2021-12-10T23:05:33Z</dc:date>
    </item>
    <item>
      <title>Re: When delta is a streaming source, how can we get the consumer lag?</title>
      <link>https://community.databricks.com/t5/data-engineering/when-delta-is-a-streaming-source-how-can-we-get-the-consumer-lag/m-p/33463#M24452</link>
      <description>&lt;P&gt;Hi, @Jose Gonzalez​&amp;nbsp;, I don't see in the link something that states the lag from the source delta table. &lt;/P&gt;&lt;P&gt;Thanks anyway.&lt;/P&gt;</description>
      <pubDate>Sun, 12 Dec 2021 09:09:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/when-delta-is-a-streaming-source-how-can-we-get-the-consumer-lag/m-p/33463#M24452</guid>
      <dc:creator>YFL</dc:creator>
      <dc:date>2021-12-12T09:09:45Z</dc:date>
    </item>
    <item>
      <title>Re: When delta is a streaming source, how can we get the consumer lag?</title>
      <link>https://community.databricks.com/t5/data-engineering/when-delta-is-a-streaming-source-how-can-we-get-the-consumer-lag/m-p/33464#M24453</link>
      <description>&lt;P&gt;Hi @Yerachmiel Feltzman​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Are you able to see these metrics?&lt;/P&gt;&lt;P&gt;{&lt;/P&gt;&lt;P&gt;  &lt;B&gt;"sources"&lt;/B&gt; : [&lt;/P&gt;&lt;P&gt;    {&lt;/P&gt;&lt;P&gt;      &lt;B&gt;"description"&lt;/B&gt; : "DeltaSource[file:/path/to/source]",&lt;/P&gt;&lt;P&gt;      &lt;B&gt;"metrics"&lt;/B&gt; : {&lt;/P&gt;&lt;P&gt;        &lt;B&gt;"numBytesOutstanding"&lt;/B&gt; : "3456",&lt;/P&gt;&lt;P&gt;        &lt;B&gt;"numFilesOutstanding"&lt;/B&gt; : "8"&lt;/P&gt;&lt;P&gt;      },&lt;/P&gt;&lt;P&gt;    }&lt;/P&gt;&lt;P&gt;  ]&lt;/P&gt;&lt;P&gt;}&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 13 Dec 2021 17:24:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/when-delta-is-a-streaming-source-how-can-we-get-the-consumer-lag/m-p/33464#M24453</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2021-12-13T17:24:14Z</dc:date>
    </item>
    <item>
      <title>Re: When delta is a streaming source, how can we get the consumer lag?</title>
      <link>https://community.databricks.com/t5/data-engineering/when-delta-is-a-streaming-source-how-can-we-get-the-consumer-lag/m-p/33465#M24454</link>
      <description>&lt;P&gt;I am.&lt;/P&gt;&lt;P&gt;Yerachmiel Feltzman | Data Platform Developer&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 13 Dec 2021 17:51:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/when-delta-is-a-streaming-source-how-can-we-get-the-consumer-lag/m-p/33465#M24454</guid>
      <dc:creator>YFL</dc:creator>
      <dc:date>2021-12-13T17:51:32Z</dc:date>
    </item>
    <item>
      <title>Re: When delta is a streaming source, how can we get the consumer lag?</title>
      <link>https://community.databricks.com/t5/data-engineering/when-delta-is-a-streaming-source-how-can-we-get-the-consumer-lag/m-p/33466#M24455</link>
      <description>&lt;P&gt;@Yerachmiel Feltzman​&amp;nbsp;- Does the fact you can see the metrics resolve the issue? &lt;/P&gt;</description>
      <pubDate>Tue, 28 Dec 2021 16:15:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/when-delta-is-a-streaming-source-how-can-we-get-the-consumer-lag/m-p/33466#M24455</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2021-12-28T16:15:56Z</dc:date>
    </item>
    <item>
      <title>Re: When delta is a streaming source, how can we get the consumer lag?</title>
      <link>https://community.databricks.com/t5/data-engineering/when-delta-is-a-streaming-source-how-can-we-get-the-consumer-lag/m-p/33467#M24456</link>
      <description>&lt;P&gt;Hi, @Piper Wilson​&amp;nbsp;. Those metrics don't solve the issue. What I am asking for is to keep tracking the lag between the running streaming and the source delta table. Those metrics give me a lot of information, like input rates (as bytes and as files) and some other metrics, but lag from the source table I don't see there.&lt;/P&gt;&lt;P&gt;Again, by lag, I mean "the difference between the source table last record position/timestamp and the current row my streaming is processing". For example (simplistic), the source table has 100 rows and the streaming has processed 90, it is lagging 10 rows. &lt;/P&gt;&lt;P&gt;What I have in mind is something similar to Kafka's consumer lag, but another approach is welcome as well. The whole point here is: "How do I keep track and know where my streaming is regarding its source? How can I know if it is processing slow than expected, ie, slower than its source delta table?"&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Jan 2022 09:33:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/when-delta-is-a-streaming-source-how-can-we-get-the-consumer-lag/m-p/33467#M24456</guid>
      <dc:creator>YFL</dc:creator>
      <dc:date>2022-01-03T09:33:32Z</dc:date>
    </item>
    <item>
      <title>Re: When delta is a streaming source, how can we get the consumer lag?</title>
      <link>https://community.databricks.com/t5/data-engineering/when-delta-is-a-streaming-source-how-can-we-get-the-consumer-lag/m-p/33468#M24457</link>
      <description>&lt;P&gt;Hi @Yerachmiel Feltzman​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You will need to take a look at the micro-batch metrics. This article will explain more what each metric means &lt;A href="https://databricks.com/blog/2020/07/29/a-look-at-the-new-structured-streaming-ui-in-apache-spark-3-0.html" alt="https://databricks.com/blog/2020/07/29/a-look-at-the-new-structured-streaming-ui-in-apache-spark-3-0.html" target="_blank"&gt;https://databricks.com/blog/2020/07/29/a-look-at-the-new-structured-streaming-ui-in-apache-spark-3-0.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 26 Jan 2022 00:43:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/when-delta-is-a-streaming-source-how-can-we-get-the-consumer-lag/m-p/33468#M24457</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2022-01-26T00:43:52Z</dc:date>
    </item>
    <item>
      <title>Re: When delta is a streaming source, how can we get the consumer lag?</title>
      <link>https://community.databricks.com/t5/data-engineering/when-delta-is-a-streaming-source-how-can-we-get-the-consumer-lag/m-p/33469#M24458</link>
      <description>&lt;P&gt;Hey @Yerachmiel Feltzman​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I hope all is well.&lt;/P&gt;&lt;P&gt;Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 12 May 2022 13:44:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/when-delta-is-a-streaming-source-how-can-we-get-the-consumer-lag/m-p/33469#M24458</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-05-12T13:44:35Z</dc:date>
    </item>
    <item>
      <title>Re: When delta is a streaming source, how can we get the consumer lag?</title>
      <link>https://community.databricks.com/t5/data-engineering/when-delta-is-a-streaming-source-how-can-we-get-the-consumer-lag/m-p/33470#M24459</link>
      <description>&lt;P&gt;Hey @Vartika Nain​&amp;nbsp;. The issue was resolved. Thanks.&lt;/P&gt;</description>
      <pubDate>Sun, 15 May 2022 12:53:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/when-delta-is-a-streaming-source-how-can-we-get-the-consumer-lag/m-p/33470#M24459</guid>
      <dc:creator>YFL</dc:creator>
      <dc:date>2022-05-15T12:53:30Z</dc:date>
    </item>
    <item>
      <title>Re: When delta is a streaming source, how can we get the consumer lag?</title>
      <link>https://community.databricks.com/t5/data-engineering/when-delta-is-a-streaming-source-how-can-we-get-the-consumer-lag/m-p/83365#M36912</link>
      <description>&lt;P&gt;Hello,&amp;nbsp;How did you solve this problem?&lt;BR /&gt;Could you kindly share it with me? I have the same problem.&lt;/P&gt;&lt;P&gt;I also want to check more details.&lt;/P&gt;</description>
      <pubDate>Sun, 18 Aug 2024 18:43:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/when-delta-is-a-streaming-source-how-can-we-get-the-consumer-lag/m-p/83365#M36912</guid>
      <dc:creator>chinhvu1111</dc:creator>
      <dc:date>2024-08-18T18:43:33Z</dc:date>
    </item>
  </channel>
</rss>

