If you have a streaming job, you need to check the batch metrics to be able to understand the stream progress.
However, here are some other suggestions which we can use to monitor a streaming job and be stuck in a "hung" state.
- Streaming Listeners
- spark.databricks.hangingTaskDetector.assumeTaskMakingProgressForZeroMetrics to false (Default=true).
- If it is a stateless stream, we can also turn on spark.speculation by setting it to true.
- Set job notifications.
And of course don't forget to collect a thread dump, before you cancel the stream and raise a support ticket to investigate further.