โ01-28-2026 05:42 AM
we are looking for some python codes that can helps us, we need to have an overview of all Databricks workspaces, their owner names, and mainly the runtime versions that they use, in every Azure and AWS subscriptions that we manage.
Can someone please suggest actions here
โ02-02-2026 08:09 AM
Hello,
You can use Workspace System tables to get an overview of each workspace in your Databricks account for both Azure and AWS and combine them to build a global view.โ
These tables let you query key metadata such as workspace ID, name, URL, cloud, and lifecycle status from the system.access.workspaces_latest table.โ
You can then join this with other system tables like system.billing.usage or compute/system tables to enrich your overview with usage, cost, and compute details per workspace.โ
By running these queries in a central account-level warehouse, you can regularly export the results (for example, to a dashboard) and use them as a single inventory of all workspaces across clouds.
โ02-03-2026 12:29 AM
Hi @anshu_roy will try this and let you know if this works thank you for the suggestion
โ03-07-2026 11:27 PM
Hi @Saurabh_kanoje,
There are two complementary approaches to get an overview of all your Databricks workspaces, their owners, and the runtime versions in use across Azure and AWS. I will walk through both.
APPROACH 1: SYSTEM TABLES (RECOMMENDED, NO EXTERNAL SDK REQUIRED)
If your workspaces are all under the same Databricks account and Unity Catalog is enabled, system tables give you a single-pane view from any workspace in the account. No additional libraries are needed, just SQL or PySpark.
Step 1: List all workspaces
The system.access.workspaces_latest table contains metadata for every active workspace in your account:
SELECT
workspace_id,
workspace_name,
workspace_url,
status,
create_time
FROM system.access.workspaces_latest
WHERE status = 'RUNNING'
ORDER BY workspace_name
Step 2: Get cluster owners and runtime versions per workspace
The system.compute.clusters table tracks every cluster configuration, including who owns it and which Databricks Runtime it runs:
SELECT
w.workspace_name,
c.workspace_id,
c.cluster_id,
c.cluster_name,
c.owned_by,
c.dbr_version,
c.driver_node_type,
c.worker_node_type,
c.cluster_source,
c.create_time,
c.delete_time
FROM system.compute.clusters c
JOIN system.access.workspaces_latest w
ON c.workspace_id = w.workspace_id
WHERE c.delete_time IS NULL
ORDER BY w.workspace_name, c.cluster_name
This gives you a cross-workspace inventory of every active cluster, its owner, and runtime version, all from a single query.
Step 3: Runtime version summary across workspaces
To get a summary of which runtime versions are in use per workspace:
SELECT
w.workspace_name,
c.dbr_version,
COUNT(*) AS cluster_count
FROM system.compute.clusters c
JOIN system.access.workspaces_latest w
ON c.workspace_id = w.workspace_id
WHERE c.delete_time IS NULL
GROUP BY w.workspace_name, c.dbr_version
ORDER BY w.workspace_name, c.dbr_version
Note: System tables require Unity Catalog to be enabled and you need account admin privileges (or explicit GRANT of USE and SELECT on the system schema). The workspaces_latest table only contains currently active workspaces; cancelled workspaces are removed.
Documentation references:
https://docs.databricks.com/en/admin/system-tables/index.html
https://docs.databricks.com/en/admin/system-tables/billing.html
https://docs.databricks.com/en/admin/system-tables/compute.html
APPROACH 2: DATABRICKS PYTHON SDK (ACCOUNT-LEVEL API)
If you need to pull this information programmatically outside of a notebook, or need to combine it with Azure/AWS subscription metadata, the Databricks SDK for Python provides an AccountClient that can list workspaces across your account, and a WorkspaceClient that can list clusters per workspace.
Step 1: Install the SDK
pip install databricks-sdk
Step 2: List all workspaces with AccountClient
from databricks.sdk import AccountClient
# Authenticate with account-level credentials
# Set DATABRICKS_ACCOUNT_ID, DATABRICKS_HOST (e.g., https://accounts.cloud.databricks.com
# or https://accounts.azuredatabricks.net), and credentials (OAuth, PAT, etc.)
a = AccountClient()
for ws in a.workspaces.list():
print(f"Workspace: {ws.workspace_name}, ID: {ws.workspace_id}, Status: {ws.workspace_status}")
Step 3: Iterate over workspaces and list clusters with runtime versions
from databricks.sdk import AccountClient, WorkspaceClient
import os
a = AccountClient()
results = []
for ws in a.workspaces.list():
try:
# Create a WorkspaceClient for each workspace
w = WorkspaceClient(
host=f"https://{ws.deployment_name}.cloud.databricks.com"
# For Azure, use the workspace URL directly
# Authenticate per workspace (OAuth, PAT, or service principal)
)
for cluster in w.clusters.list():
results.append({
"workspace_name": ws.workspace_name,
"workspace_id": ws.workspace_id,
"cluster_name": cluster.cluster_name,
"cluster_id": cluster.cluster_id,
"owner": cluster.creator_user_name,
"runtime_version": cluster.spark_version,
"state": cluster.state.value if cluster.state else None,
})
except Exception as e:
print(f"Could not connect to {ws.workspace_name}: {e}")
# Convert to a DataFrame for easy viewing
import pandas as pd
df = pd.DataFrame(results)
print(df.to_string())
Important notes on the SDK approach:
- AccountClient requires account-level authentication. For Azure, the host is https://accounts.azuredatabricks.net. For AWS, it is https://accounts.cloud.databricks.com.
- You need a service principal or user with account admin privileges.
- Notebook-native authentication does not work with AccountClient, so you must explicitly set credentials via environment variables, a .databrickscfg profile, or constructor arguments.
- For cross-cloud scenarios (both Azure and AWS), you would need separate AccountClient instances, one per account/cloud.
Documentation references:
https://docs.databricks.com/en/dev-tools/sdk-python.html
https://docs.databricks.com/en/dev-tools/auth/index.html
BONUS: BILLING USAGE FOR WORKLOAD IDENTIFICATION
If you also want to understand which workloads are consuming resources (not just what clusters exist), the system.billing.usage table is very helpful:
SELECT
w.workspace_name,
u.usage_metadata.cluster_id,
u.billing_origin_product,
u.sku_name,
u.identity_metadata.run_as AS run_as_user,
SUM(u.usage_quantity) AS total_dbus,
MIN(u.usage_date) AS first_usage,
MAX(u.usage_date) AS last_usage
FROM system.billing.usage u
JOIN system.access.workspaces_latest w
ON u.workspace_id = w.workspace_id
WHERE u.usage_date >= CURRENT_DATE - INTERVAL 30 DAYS
GROUP BY ALL
ORDER BY total_dbus DESC
This shows which workspaces, clusters, and users are consuming the most DBUs, broken down by product type (JOBS, SQL, etc.).
I would recommend starting with the system tables approach since it requires no additional setup beyond Unity Catalog and gives you a unified view across all workspaces and cloud providers under the same account.
* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.