cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

SQL queries unable to finish with databricks sql pro warehose

MiriamHundemer
New Contributor III

During a recent workspace migration/recreation of workspaces we stumbled upon the problem, that it was no longer possible to execute sql queries (or get sample data from tables) using our sql warehouse or the shared compute cluster.

A sql query like SELECT 1; would start but then remain in some limbo state until canceled.

Investigations into the problem revealed that the sql warehouse and also the dev cluster were no longer able to successfully form a cluster thus being unable to execute a query.

After further investigations we discovered that this was due to a change in the naming schema for workspace related firewall rules.

Now the firewall rules are named like the (sub)network that is used by the workspace. So instead of having names like databricks-<workspace_id>-ingress (which was the old naming schema) we now have names like db-<subnet_name>-ingress.

The network rules are applied to virtual machines according to network tags that are attached to each virtual machine in a workspace. These tags have the form databricks-<workspace_id>.

Unfortunately when deleting an old workspace, the firewall rule is not deleted together with the other resources. When a new workspace is created, the virtual machines of this new workspace also have new tags attached to them containing the id of the new workspace. The target tags of the (not deleted) firewall rule, remain the same and still contain the workspace id of the old (now deleted) workspace.

Therefore, the firewall rule never gets applied to the new workspace's virtual machines which prevents all cluster creation within the new workspace.

My expectation when deleting a workspace would be that all related resource in the cloud environment will get deleted as well but this does not seem to be case ( I am actually left with quite a few artifacts after deleting a workspace).

What would be recommended way to deal with this? Having to remember to manually delete all the leftover resources does not seem to be feasible way in my opinion.

Are there any plans to improve automatic resource cleanup after deleting a workspace?

1 ACCEPTED SOLUTION

Accepted Solutions

youssefmrini
Databricks Employee
Databricks Employee

1. Infrastructure as Code (IaC) - The "Golden Standard"

If you aren't already, moving your workspace deployment to Terraform is the best way to solve this. Terraform maintains a "state" file. When you run terraform destroy, it doesn't just delete the workspace; it tracks every specific firewall rule and tag it created.

  • Benefit: It ensures that if a rule was created for Workspace A, it is removed when Workspace A dies, regardless of the naming schema.

2. Implementation of "Cleanup Tags"

To avoid manual hunting, apply a "Lifecycle" tag to every resource (including firewall rules) during creation, such as Owner: <workspace_id>.

  • You can then run a simple Janitor Script (via CLI or SDK) that runs every night or post-deletion to find any networking resources where the Owner ID no longer exists in your list of active workspaces.

3. Move to Serverless SQL Warehouses

If your organization allows it, moving to Serverless SQL bypasses this entirely.

  • How it helps: Serverless compute runs in a Databricks-managed environment rather than your VNET. This means you don't have to manage the firewall rules, subnets, or tags for those SQL Warehousesโ€”Databricks handles the entire lifecycle and cleanup.

View solution in original post

1 REPLY 1

youssefmrini
Databricks Employee
Databricks Employee

1. Infrastructure as Code (IaC) - The "Golden Standard"

If you aren't already, moving your workspace deployment to Terraform is the best way to solve this. Terraform maintains a "state" file. When you run terraform destroy, it doesn't just delete the workspace; it tracks every specific firewall rule and tag it created.

  • Benefit: It ensures that if a rule was created for Workspace A, it is removed when Workspace A dies, regardless of the naming schema.

2. Implementation of "Cleanup Tags"

To avoid manual hunting, apply a "Lifecycle" tag to every resource (including firewall rules) during creation, such as Owner: <workspace_id>.

  • You can then run a simple Janitor Script (via CLI or SDK) that runs every night or post-deletion to find any networking resources where the Owner ID no longer exists in your list of active workspaces.

3. Move to Serverless SQL Warehouses

If your organization allows it, moving to Serverless SQL bypasses this entirely.

  • How it helps: Serverless compute runs in a Databricks-managed environment rather than your VNET. This means you don't have to manage the firewall rules, subnets, or tags for those SQL Warehousesโ€”Databricks handles the entire lifecycle and cleanup.