Databricks Community

638555 · ‎11-13-2022

Hello,

I am new to Databricks, and I have been trying to understand how auto termination works, but I am unsure if it is a problem based on my configuration or something else. The problem is the same in all cases, the cluster on GCP created by Databricks does not auto-terminate, but on the Databricks side, it looks different in each case.

1. In the case of clusters created through the compute interface, I have a single-node cluster (tried multi-node too) that is set to terminate after 2 hours. I spin it up and attach some notebook or a job to it. I wait for the job to finish and after that, I let it idle for more than 2 hours. Although the cluster on Databricks shows that it is terminated, when I go to GCP I still have a rogue cluster running that was created by Databricks. I have no pools, policy, or anything configured and Databricks does not show anything running in All-purpose compute or job compute.

2. In the case of Jobs the behavior is the same as above if I set the cluster to run on my running All-Purpose cluster.

3. In the case of delta tables, a Job compute is automatically created, and after the delta tables operation finishes, I let it idle for more than 2 hours (Development environment) and the Job cluster is still running. In this case, I can see it running on Databricks and GCP as well. I tried setting pipelines.clusterShutdown.delay too, but it has no effect.

In both cases, the cluster is running until I delete it manually from GCP. How can I ensure that my cluster gets shut down on GCP properly so I don't get charged?

Thank you.

LandanG · ‎11-14-2022

Hi @Tilemachos Charalampous ,

The compute resources in your GCP account might not be the Spark clusters, rather the GKE cluster that Databricks spins up for the Databricks architecture in your account.

The note in the blue highlight in docs here https://docs.gcp.databricks.com/administration-guide/account-settings-gcp/workspaces.html#create-and... should go into it in more detail.

If no clusters are running but you still see a Databricks-created GKE cluster, it would most likely be that.

View solution in original post

638555 · ‎11-14-2022

Digging more into this, I realized that even if I terminate the cluster on GCP, it gets respawned shortly after automatically.

All clusters on Databricks are terminated, and no job clusters or pools appear either.

So I have a rogue GCP Databricks-created cluster running constantly.

LandanG · ‎11-14-2022

Hi @Tilemachos Charalampous ,

The compute resources in your GCP account might not be the Spark clusters, rather the GKE cluster that Databricks spins up for the Databricks architecture in your account.

The note in the blue highlight in docs here https://docs.gcp.databricks.com/administration-guide/account-settings-gcp/workspaces.html#create-and... should go into it in more detail.

If no clusters are running but you still see a Databricks-created GKE cluster, it would most likely be that.

638555 · ‎11-21-2022

Hi @Landan George thanks for the answer. This looks correct. I probably missed it while going over the documentation. Thanks for helping.

Databricks Community

Auto termination for clusters, jobs, and delta live tables does not terminate clusters on GCP.

Photos

Join Us as a Local Community Builder!

Exciting Opportunity to Collaborate with Us!

Intelligent Data Warehousing: AI/BI for Self-service Analytics

Share Your Thoughts on Databricks & Get Rewarded!

Get Started With Lakehouse Architecture | Pass a quiz to earn your certificate completion.

Virtual Learning Festival: 9 April - 30 April