Hi @Saurabh_kanoje,
There are two complementary approaches to get an overview of all your Databricks workspaces, their owners, and the runtime versions in use across Azure and AWS. I will walk through both.
APPROACH 1: SYSTEM TABLES (RECOMMENDED, NO EXTERNAL SDK REQUIRED)
If your workspaces are all under the same Databricks account and Unity Catalog is enabled, system tables give you a single-pane view from any workspace in the account. No additional libraries are needed, just SQL or PySpark.
Step 1: List all workspaces
The system.access.workspaces_latest table contains metadata for every active workspace in your account:
SELECT
workspace_id,
workspace_name,
workspace_url,
status,
create_time
FROM system.access.workspaces_latest
WHERE status = 'RUNNING'
ORDER BY workspace_name
Step 2: Get cluster owners and runtime versions per workspace
The system.compute.clusters table tracks every cluster configuration, including who owns it and which Databricks Runtime it runs:
SELECT
w.workspace_name,
c.workspace_id,
c.cluster_id,
c.cluster_name,
c.owned_by,
c.dbr_version,
c.driver_node_type,
c.worker_node_type,
c.cluster_source,
c.create_time,
c.delete_time
FROM system.compute.clusters c
JOIN system.access.workspaces_latest w
ON c.workspace_id = w.workspace_id
WHERE c.delete_time IS NULL
ORDER BY w.workspace_name, c.cluster_name
This gives you a cross-workspace inventory of every active cluster, its owner, and runtime version, all from a single query.
Step 3: Runtime version summary across workspaces
To get a summary of which runtime versions are in use per workspace:
SELECT
w.workspace_name,
c.dbr_version,
COUNT(*) AS cluster_count
FROM system.compute.clusters c
JOIN system.access.workspaces_latest w
ON c.workspace_id = w.workspace_id
WHERE c.delete_time IS NULL
GROUP BY w.workspace_name, c.dbr_version
ORDER BY w.workspace_name, c.dbr_version
Note: System tables require Unity Catalog to be enabled and you need account admin privileges (or explicit GRANT of USE and SELECT on the system schema). The workspaces_latest table only contains currently active workspaces; cancelled workspaces are removed.
Documentation references:
https://docs.databricks.com/en/admin/system-tables/index.html
https://docs.databricks.com/en/admin/system-tables/billing.html
https://docs.databricks.com/en/admin/system-tables/compute.html
APPROACH 2: DATABRICKS PYTHON SDK (ACCOUNT-LEVEL API)
If you need to pull this information programmatically outside of a notebook, or need to combine it with Azure/AWS subscription metadata, the Databricks SDK for Python provides an AccountClient that can list workspaces across your account, and a WorkspaceClient that can list clusters per workspace.
Step 1: Install the SDK
pip install databricks-sdk
Step 2: List all workspaces with AccountClient
from databricks.sdk import AccountClient
# Authenticate with account-level credentials
# Set DATABRICKS_ACCOUNT_ID, DATABRICKS_HOST (e.g., https://accounts.cloud.databricks.com
# or https://accounts.azuredatabricks.net), and credentials (OAuth, PAT, etc.)
a = AccountClient()
for ws in a.workspaces.list():
print(f"Workspace: {ws.workspace_name}, ID: {ws.workspace_id}, Status: {ws.workspace_status}")
Step 3: Iterate over workspaces and list clusters with runtime versions
from databricks.sdk import AccountClient, WorkspaceClient
import os
a = AccountClient()
results = []
for ws in a.workspaces.list():
try:
# Create a WorkspaceClient for each workspace
w = WorkspaceClient(
host=f"https://{ws.deployment_name}.cloud.databricks.com"
# For Azure, use the workspace URL directly
# Authenticate per workspace (OAuth, PAT, or service principal)
)
for cluster in w.clusters.list():
results.append({
"workspace_name": ws.workspace_name,
"workspace_id": ws.workspace_id,
"cluster_name": cluster.cluster_name,
"cluster_id": cluster.cluster_id,
"owner": cluster.creator_user_name,
"runtime_version": cluster.spark_version,
"state": cluster.state.value if cluster.state else None,
})
except Exception as e:
print(f"Could not connect to {ws.workspace_name}: {e}")
# Convert to a DataFrame for easy viewing
import pandas as pd
df = pd.DataFrame(results)
print(df.to_string())
Important notes on the SDK approach:
- AccountClient requires account-level authentication. For Azure, the host is https://accounts.azuredatabricks.net. For AWS, it is https://accounts.cloud.databricks.com.
- You need a service principal or user with account admin privileges.
- Notebook-native authentication does not work with AccountClient, so you must explicitly set credentials via environment variables, a .databrickscfg profile, or constructor arguments.
- For cross-cloud scenarios (both Azure and AWS), you would need separate AccountClient instances, one per account/cloud.
Documentation references:
https://docs.databricks.com/en/dev-tools/sdk-python.html
https://docs.databricks.com/en/dev-tools/auth/index.html
BONUS: BILLING USAGE FOR WORKLOAD IDENTIFICATION
If you also want to understand which workloads are consuming resources (not just what clusters exist), the system.billing.usage table is very helpful:
SELECT
w.workspace_name,
u.usage_metadata.cluster_id,
u.billing_origin_product,
u.sku_name,
u.identity_metadata.run_as AS run_as_user,
SUM(u.usage_quantity) AS total_dbus,
MIN(u.usage_date) AS first_usage,
MAX(u.usage_date) AS last_usage
FROM system.billing.usage u
JOIN system.access.workspaces_latest w
ON u.workspace_id = w.workspace_id
WHERE u.usage_date >= CURRENT_DATE - INTERVAL 30 DAYS
GROUP BY ALL
ORDER BY total_dbus DESC
This shows which workspaces, clusters, and users are consuming the most DBUs, broken down by product type (JOBS, SQL, etc.).
I would recommend starting with the system tables approach since it requires no additional setup beyond Unity Catalog and gives you a unified view across all workspaces and cloud providers under the same account.
* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.