If you use Databricks Jobs for your workloads, it is possible you might have run into a situation where you find your jobs to be in "hung" state.
Before cancelling the job it is important to collect the thread dump as I described here to be able to find the root cause.
But how do we not run into a situation where our jobs are in the "hung" state for a prolonged time?
If you know the expected time of the job completion, with a buffer you should always set Job timeouts as described here.
To configure a maximum completion time for a job, enter the maximum duration in the Timeout field. If the job does not complete in this time, Databricks sets its status to โTimed Outโ and the job is stopped.
For Streaming Jobs check my post here.