Cluster termination issue
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-10-2023 11:59 AM
I am using Databricks as a Community Edition user with a limited cluster (just 1 Driver: 15.3 GB Memory, 2 Cores, 1 DBU). I am trying to run some custom algorithms for continuous calculations and writing results to the delta table every 15 minutes along with notifying me by email using SMTP protocol.
The problem is I intend to do calculations let's say for a particular depth (imaging building a hierarchical-deterministic wallet that is represented as a tree) and those calculations may take a few hours or even up to one day. But for some reason, my cluster is being terminated after 1 hour of processing.
I was looking for a solution to similar issues and suggestions like making spark.sql("select 1") just to keep the cluster alive even if I do it as a daemon process never worked for me. Even as I mentioned before I do df.write(...) results to my table but it also doesn't keep the cluster alive enough time.
So, I am wondering if is there any solution to my problem and if there is another way to keep the cluster alive or if it is just limited for Community Edition users to do processing on a cluster longer than 1 hour.
thanks in advance
- Labels:
-
Delta Lake
-
Spark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-05-2024 05:05 PM
Hi @Retired_mod , we in the team are wondering what will happen if we put 0 minutes in the "Terminate after" settings of the all-purpose compute. Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-05-2024 06:26 PM
If you set the "Terminate after" setting to 0 minutes during the creation of an all-purpose compute, it means that the auto-termination feature will be turned off. This is because the "Terminate after" setting is used to specify an inactivity period in minutes after which you want the compute to terminate. If the difference between the current time and the last command run on the compute is more than the inactivity period specified, Databricks automatically terminates that compute. Therefore, by setting it to 0, you are essentially opting out of auto-termination. This means that the compute will continue to run until it is manually terminated, regardless of whether it is active or idle. Please note that idle compute continue to accumulate DBU and cloud instance charges during the inactivity period before termination
https://docs.databricks.com/en/compute/clusters-manage.html#configure-automatic-termination
I am not sure how if the community edition has any restriction on that though.