cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

What does durationMs.commitBatch measure?

Erik
Valued Contributor III

With a structured streamin job from Kafka, we have a metric in durationMs called commitBatch. There is also an example of this in this databricks documentation. I can not find any description of what this measures, and how it relates to the other metrics.

3 REPLIES 3

Walter_C
Databricks Employee
Databricks Employee

The commitBatch metric in the durationMs object measures the time taken to commit the batch of data being processed. This includes the time required to write the batch data to the sink and update the offsets to reflect the processed data.

Erik
Valued Contributor III

Have I understood correct that it is the time to write the data to sink, and also update the checkpoint location?

How does it relate to e.g addBatch, which is "The time taken to execute the microbatch." In the example I linked to we have "addBatch" : 5397, "commitBatch" : 4429'.

Does that mean that computing the actuall microbatch took 5s, and writing it out and committing it took 4,4s for a total of 9,4?

And why is it not always present? E.g. in this example with a delta sink, this example with kafka-to-kafka, or this delta-to-delta?

Walter_C
Databricks Employee
Databricks Employee

The commitBatch metric is a part of the overall triggerExecution time, which encompasses all stages of planning and executing the microbatch, including committing the batch data and updating offsets.

The commitBatch metric may not always be present in every example. Its presence depends on the specific implementation and the metrics that are being tracked for that particular streaming query. For instance, in the examples you mentioned:

  • The rate source to Delta Lake example does not include commitBatch because it may not be relevant or tracked for that specific query.
  • The Kafka-to-Kafka example also does not include commitBatch, possibly due to differences in how metrics are collected or reported for Kafka sinks.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group