Why configure a job timeout?

Community Articles

Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.

If you use Databricks Jobs for your workloads, it is possible you might have run into a situation where you find your jobs to be in "hung" state.

Before cancelling the job it is important to collect the thread dump as I described here to be able to find the root cause.

But how do we not run into a situation where our jobs are in the "hung" state for a prolonged time?

If you know the expected time of the job completion, with a buffer you should always set Job timeouts as described here.

To configure a maximum completion time for a job, enter the maximum duration in the Timeout field. If the job does not complete in this time, Databricks sets its status to “Timed Out” and the job is stopped.

For Streaming Jobs check my post here.