02-03-2022 01:38 PM
Currently, I am running a cluster that is set to terminate after 60 minutes of inactivity. However, in one of my notebooks, one of the cells is still running. How can I prevent this from happening, if want my notebook to run overnight without monitoring it and why is this happening?
02-03-2022 02:07 PM
@Kevin Kim As mentioned in the docs at https://docs.databricks.com/clusters/clusters-manage.html#automatic-termination-1 an autoterminating cluster may be terminated while it is running commands. From docs > The auto termination feature monitors only Spark jobs, not user-defined local processes. Therefore, if all Spark jobs have completed, a cluster may be terminated even if local processes are running.
If you want to keep the cluster active all the time, either you can disable "Automatic termination"(if allowed) or create a notebook with simple print or "%sql select 1" commands and schedule it to run at regular intervals(avoid scheduling forever) to keep the cluster active all the time.
Also, did you explore scheduling the notebook as a job? since auto termination is applicable only for all-purpose clusters.
***Note: Idle clusters continue to accumulate DBU and cloud instance charges during the inactivity period before termination.***
02-03-2022 02:07 PM
@Kevin Kim As mentioned in the docs at https://docs.databricks.com/clusters/clusters-manage.html#automatic-termination-1 an autoterminating cluster may be terminated while it is running commands. From docs > The auto termination feature monitors only Spark jobs, not user-defined local processes. Therefore, if all Spark jobs have completed, a cluster may be terminated even if local processes are running.
If you want to keep the cluster active all the time, either you can disable "Automatic termination"(if allowed) or create a notebook with simple print or "%sql select 1" commands and schedule it to run at regular intervals(avoid scheduling forever) to keep the cluster active all the time.
Also, did you explore scheduling the notebook as a job? since auto termination is applicable only for all-purpose clusters.
***Note: Idle clusters continue to accumulate DBU and cloud instance charges during the inactivity period before termination.***
10-01-2024 12:00 AM - edited 10-01-2024 12:02 AM
Hello
I've been using this solution to keep my session alive when the process needs it. Put 10 minutes auto shutdown and some process uses a "keep alive" feature (or not).
We made a thread and it does a basic spark.sql("select 1").collect()
It worked well in 10.4 LTS run time. Now we had to move our databricks from France Central (Azure) to West Europe + upgrade to 12.2 LTS run time, and the query is not considered as an activity query.
Meaning that the query is sent a periodic even, but does not keep anymore the cluster alive. The cluster shutdown and when a tick comes to do the "select 1" it restart the cluster. But we lost the spark session and all our dataframes.
Any idea on how to fix this ?
02-03-2022 04:18 PM
Can you just turn off the auto turn off?
02-03-2022 11:41 PM
indeed, uncheck the 'Terminate after x minutes of inactivity' flag
08-07-2024 12:31 PM
How do you uncheck this? Where can I find this button?
02-10-2022 06:13 AM
If a cell is already running ( I assume it's a streaming operation), then I think it doesn't mean that the cluster is inactive. The cluster should be running if a cell is running on it.
On the other hand, if you want to keep running your clusters for a specific period of time (say 10pm to 8am), then you can schedule a cron job to invoke at 55th minute to run a basic command in a simple notebook on that very cluster. You can schedule the job execution using a cron expression.
Also, using cluster API, you can monitor if the cluster is running or not.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group