Monitoring a Streaming Job

NandiniN
Databricks Employee
Databricks Employee

If you have a streaming job, you need to check the batch metrics to be able to understand the stream progress.

However, here are some other suggestions which we can use to monitor a streaming job and be stuck in a "hung" state.

  1. Streaming Listeners 
  2. spark.databricks.hangingTaskDetector.assumeTaskMakingProgressForZeroMetrics to false (Default=true).
  3.  If it is a stateless stream, we can also turn on spark.speculation by setting it to true.
  4. Set job notifications.

And of course don't forget to collect a thread dump, before you cancel the stream and raise a support ticket to investigate further.