<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: What does durationMs.commitBatch measure? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/what-does-durationms-commitbatch-measure/m-p/104259#M41695</link>
    <description>&lt;P&gt;Have I understood correct that it is the time to write the data to sink, and also update the checkpoint location?&lt;/P&gt;&lt;P&gt;How does it relate to e.g addBatch, which is "The time taken to execute the microbatch." In the example I linked to we have "addBatch" : 5397, "commitBatch" : 4429'.&lt;/P&gt;&lt;P&gt;Does that mean that computing the actuall microbatch took 5s, and writing it out and committing it took 4,4s for a total of 9,4?&lt;/P&gt;&lt;P&gt;And why is it not always present? E.g. in &lt;A href="https://docs.databricks.com/en/structured-streaming/stream-monitoring.html#example-rate-source-to-delta-lake-streamingquerylistener-event" target="_blank" rel="noopener"&gt;this&lt;/A&gt; example with a delta sink, &lt;A href="https://docs.databricks.com/en/structured-streaming/stream-monitoring.html#example-kafka-to-kafka-streamingquerylistener-event" target="_blank" rel="noopener"&gt;this&lt;/A&gt; example with kafka-to-kafka, or &lt;A href="https://docs.databricks.com/en/structured-streaming/stream-monitoring.html#example-rate-source-to-delta-lake-streamingquerylistener-event" target="_blank" rel="noopener"&gt;this&lt;/A&gt; delta-to-delta?&lt;/P&gt;</description>
    <pubDate>Sun, 05 Jan 2025 21:30:11 GMT</pubDate>
    <dc:creator>Erik</dc:creator>
    <dc:date>2025-01-05T21:30:11Z</dc:date>
    <item>
      <title>What does durationMs.commitBatch measure?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-does-durationms-commitbatch-measure/m-p/104244#M41688</link>
      <description>&lt;P&gt;With a structured streamin job from Kafka, we have a metric in durationMs called commitBatch. There is also an example of this in &lt;A href="https://docs.databricks.com/en/structured-streaming/stream-monitoring.html#example-kinesis-to-delta-lake-streamingquerylistener-event" target="_blank"&gt;this&lt;/A&gt; databricks documentation. I can not find any description of what this measures, and how it relates to the other metrics.&lt;/P&gt;</description>
      <pubDate>Sun, 05 Jan 2025 13:50:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-does-durationms-commitbatch-measure/m-p/104244#M41688</guid>
      <dc:creator>Erik</dc:creator>
      <dc:date>2025-01-05T13:50:15Z</dc:date>
    </item>
    <item>
      <title>Re: What does durationMs.commitBatch measure?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-does-durationms-commitbatch-measure/m-p/104246#M41690</link>
      <description>&lt;P&gt;The &lt;CODE&gt;commitBatch&lt;/CODE&gt; metric in the &lt;CODE&gt;durationMs&lt;/CODE&gt; object measures the time taken to commit the batch of data being processed. This includes the time required to write the batch data to the sink and update the offsets to reflect the processed data.&lt;/P&gt;</description>
      <pubDate>Sun, 05 Jan 2025 14:10:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-does-durationms-commitbatch-measure/m-p/104246#M41690</guid>
      <dc:creator>Walter_C</dc:creator>
      <dc:date>2025-01-05T14:10:19Z</dc:date>
    </item>
    <item>
      <title>Re: What does durationMs.commitBatch measure?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-does-durationms-commitbatch-measure/m-p/104259#M41695</link>
      <description>&lt;P&gt;Have I understood correct that it is the time to write the data to sink, and also update the checkpoint location?&lt;/P&gt;&lt;P&gt;How does it relate to e.g addBatch, which is "The time taken to execute the microbatch." In the example I linked to we have "addBatch" : 5397, "commitBatch" : 4429'.&lt;/P&gt;&lt;P&gt;Does that mean that computing the actuall microbatch took 5s, and writing it out and committing it took 4,4s for a total of 9,4?&lt;/P&gt;&lt;P&gt;And why is it not always present? E.g. in &lt;A href="https://docs.databricks.com/en/structured-streaming/stream-monitoring.html#example-rate-source-to-delta-lake-streamingquerylistener-event" target="_blank" rel="noopener"&gt;this&lt;/A&gt; example with a delta sink, &lt;A href="https://docs.databricks.com/en/structured-streaming/stream-monitoring.html#example-kafka-to-kafka-streamingquerylistener-event" target="_blank" rel="noopener"&gt;this&lt;/A&gt; example with kafka-to-kafka, or &lt;A href="https://docs.databricks.com/en/structured-streaming/stream-monitoring.html#example-rate-source-to-delta-lake-streamingquerylistener-event" target="_blank" rel="noopener"&gt;this&lt;/A&gt; delta-to-delta?&lt;/P&gt;</description>
      <pubDate>Sun, 05 Jan 2025 21:30:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-does-durationms-commitbatch-measure/m-p/104259#M41695</guid>
      <dc:creator>Erik</dc:creator>
      <dc:date>2025-01-05T21:30:11Z</dc:date>
    </item>
    <item>
      <title>Re: What does durationMs.commitBatch measure?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-does-durationms-commitbatch-measure/m-p/104369#M41717</link>
      <description>&lt;P&gt;The &lt;CODE&gt;commitBatch&lt;/CODE&gt; metric is a part of the overall &lt;CODE&gt;triggerExecution&lt;/CODE&gt; time, which encompasses all stages of planning and executing the microbatch, including committing the batch data and updating offsets.&lt;/P&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;The &lt;CODE&gt;commitBatch&lt;/CODE&gt; metric may not always be present in every example. Its presence depends on the specific implementation and the metrics that are being tracked for that particular streaming query. For instance, in the examples you mentioned:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;The &lt;CODE&gt;rate source to Delta Lake&lt;/CODE&gt; example does not include &lt;CODE&gt;commitBatch&lt;/CODE&gt; because it may not be relevant or tracked for that specific query.&lt;/LI&gt;
&lt;LI&gt;The &lt;CODE&gt;Kafka-to-Kafka&lt;/CODE&gt; example also does not include &lt;CODE&gt;commitBatch&lt;/CODE&gt;, possibly due to differences in how metrics are collected or reported for Kafka sinks.&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Mon, 06 Jan 2025 14:37:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-does-durationms-commitbatch-measure/m-p/104369#M41717</guid>
      <dc:creator>Walter_C</dc:creator>
      <dc:date>2025-01-06T14:37:15Z</dc:date>
    </item>
  </channel>
</rss>

