cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

StreamingQueryListener metrics strange behaviour (inputRowsPerSecond metric is set to 0)

YuriS
New Contributor II

After implementing StreamingQueryListener to enable integration with our monitoring solution we have noticed some strange metrics for our DeltaSource streams (based on https://learn.microsoft.com/en-us/azure/databricks/structured-streaming/stream-monitoring)

In some cases for DeltaSource streams metric inputRowsPerSecond is set to 0:

YuriS_0-1769419735190.png

For the particular event (same is visible on Spark UI)

YuriS_1-1769419836870.png

Also would be good to understand what is the different between batch and trigger - are these the same, or difference would be visible only when batches are restarted?

Thank you

 

1 REPLY 1

hasnat_unifeye
New Contributor III

Firstly -  letโ€™s talk about batch vs trigger.

A trigger is the scheduling event that tells Spark when to check for new data (eg processingTime, availableNow, once). A batch (micro-batch) is the actual unit of work that processes data, reads input, and commits results. In many cases there is a 1:1 relationship, so they appear the same, but they are conceptually different. The difference becomes visible during restarts, backlog processing, or when a trigger fires but no data is available.

This video gives a clear explanation of trigger behaviour in Structured Streaming:
https://www.youtube.com/watch?v=t7cRAIgVduQ

Regarding the metrics shown (batchDuration and triggerExecution being equal):
this looks strange, but it is expected for micro-batch streaming when a single batch fully occupies the trigger window. the trigger execution time often includes delta metadata work and waiting, so both values can collapse to the same duration.

This also explains why inputRowsPerSecond can be reported as 0.0 for DeltaSource streams. the metric is derived from numInputRows divided by trigger execution time - so yes its slightly strange. when most of the trigger time is spent waiting or doing metadata operations rather than actively reading rows, spark may report an effective input rate of zero even though rows were processed. i would say for monitoring - numInputRows is the more reliable metric.


So - are they the same? - No. A trigger defines when Spark checks for work (e.g. an interval). A batch runs if and only if data is available when the trigger fires.
Is a difference visible on batch restart - No. The difference is not only visible when batches are restarted.

In this video, we dive into Spark Structured Streaming Trigger Modes-a key feature for managing how your streaming queries process data. Whether you're working with real-time data pipelines, ETL jobs, or low-latency applications, understanding trigger modes is essential to optimize your Spark ...