topic Databricks Job scheduling - continuous mode in Data Engineering

Databricks Job scheduling - continuous mode

smurug — Tue, 01 Aug 2023 18:30:03 GMT

While scheduling the Databricks job using continuous mode - what will happen if the job is configured to run with Job cluster.

At the end of each run will the cluster be terminated and re-created again for the next run? The official documentation is not clear but it only mentioned that there will be a slight delay and it will be less than 60 seconds.

But a quick practical check for this scenario, points in the direction that the cluster is getting re-created, because a simple do nothing notebook is taking 2 minutes to completed and from the logs it looks like different clusters are used. Not conclusive though.

Appreciate any thoughts on the same - because logically the continuous option should re-use the cluster (to save on the start-up time), otherwise the value this option brings is limited.

Re: Databricks Job scheduling - continuous mode

Tharun-Kumar — Wed, 02 Aug 2023 07:34:32 GMT

@smurug

Job Cluster has been designed to be unique for each run of a job. So, each run of your job would run against a new job cluster.

If you want your job to run continuously without any delay and to re-use the cluster, I would recommend to use a dedicated interactive cluster. In this case, the cluster would be retained across job runs and your job runs would be instantly executed after the previous run is completed.

Re: Databricks Job scheduling - continuous mode

smurug — Wed, 02 Aug 2023 14:30:43 GMT

Thanks for the response - Yes we are doing this currently (using interactive cluster), however following are the pointers which are being considered for re-evaluating this approach and arrive at a possible alternative (if possible)

1) Cost difference between Interactive and Job cluster

2) In the Production environment, the following error is being received every now and then -

run failed with error message Context ExecutionContextId(1496834584910869936) is disconnected.. While this error can be received for multiple reasons, cluster resource constraints is one of the main reasons as per the understanding. Hence the thought process is to have individual Job clusters for different jobs, which can be scaled independently, hence this will result in making dedicated resources available for the Jobs rather than shared resources from interactive cluster across all jobs. However it might not be feasible to create many interactive cluster consider the costing, hence using Job cluster can offset some of this cost and help in reducing the overall cost.

Further, searching around the net - found this article https://medium.com/@24chynoweth/continuous-jobs-and-file-triggers-in-databricks-e7ba51a0c93a which mentioned about resources being re-used.

Also, the official documentation, https://docs.databricks.com/workflows/jobs/schedule-jobs.html - does not mention anything clearly about the re-use / termination, but mentions that there will be a slight delay which will be less than 60 seconds. Hence if the cluster needs to be re-created, I don't think it can guarantee only 60 seconds delay.

Re: Databricks Job scheduling - continuous mode

Jo5h — Fri, 29 Sep 2023 08:49:19 GMT

Hello @youssefmrini

So how is the DBU calculated? As the cluster is reused, the DBU should be calculated per hour on all the jobs run in an hour correct? Or will it be calculated based on each run?

I would like to know the cost calculation when running the continuous job