cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Identifying workload in azure and AWS

Saurabh_kanoje
New Contributor II

we are looking for some python codes that can helps us, we need to have an overview of all Databricks workspaces, their owner names, and mainly the runtime versions that they use, in every Azure and AWS subscriptions that we manage. 
Can someone please suggest actions here 

3 REPLIES 3

anshu_roy
Databricks Employee
Databricks Employee

Hello,

You can use Workspace System tables to get an overview of each workspace in your Databricks account for both Azure and AWS and combine them to build a global view.โ€‹

These tables let you query key metadata such as workspace ID, name, URL, cloud, and lifecycle status from the system.access.workspaces_latest table.โ€‹
You can then join this with other system tables like system.billing.usage or compute/system tables to enrich your overview with usage, cost, and compute details per workspace.โ€‹
By running these queries in a central account-level warehouse, you can regularly export the results (for example,  to a dashboard) and use them as a single inventory of all workspaces across clouds.

 

Hi @anshu_roy will try this and let you know if this works thank you for the suggestion 

 

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @Saurabh_kanoje,

There are two complementary approaches to get an overview of all your Databricks workspaces, their owners, and the runtime versions in use across Azure and AWS. I will walk through both.


APPROACH 1: SYSTEM TABLES (RECOMMENDED, NO EXTERNAL SDK REQUIRED)

If your workspaces are all under the same Databricks account and Unity Catalog is enabled, system tables give you a single-pane view from any workspace in the account. No additional libraries are needed, just SQL or PySpark.

Step 1: List all workspaces

The system.access.workspaces_latest table contains metadata for every active workspace in your account:

SELECT
workspace_id,
workspace_name,
workspace_url,
status,
create_time
FROM system.access.workspaces_latest
WHERE status = 'RUNNING'
ORDER BY workspace_name

Step 2: Get cluster owners and runtime versions per workspace

The system.compute.clusters table tracks every cluster configuration, including who owns it and which Databricks Runtime it runs:

SELECT
w.workspace_name,
c.workspace_id,
c.cluster_id,
c.cluster_name,
c.owned_by,
c.dbr_version,
c.driver_node_type,
c.worker_node_type,
c.cluster_source,
c.create_time,
c.delete_time
FROM system.compute.clusters c
JOIN system.access.workspaces_latest w
ON c.workspace_id = w.workspace_id
WHERE c.delete_time IS NULL
ORDER BY w.workspace_name, c.cluster_name

This gives you a cross-workspace inventory of every active cluster, its owner, and runtime version, all from a single query.

Step 3: Runtime version summary across workspaces

To get a summary of which runtime versions are in use per workspace:

SELECT
w.workspace_name,
c.dbr_version,
COUNT(*) AS cluster_count
FROM system.compute.clusters c
JOIN system.access.workspaces_latest w
ON c.workspace_id = w.workspace_id
WHERE c.delete_time IS NULL
GROUP BY w.workspace_name, c.dbr_version
ORDER BY w.workspace_name, c.dbr_version

Note: System tables require Unity Catalog to be enabled and you need account admin privileges (or explicit GRANT of USE and SELECT on the system schema). The workspaces_latest table only contains currently active workspaces; cancelled workspaces are removed.

Documentation references:
https://docs.databricks.com/en/admin/system-tables/index.html
https://docs.databricks.com/en/admin/system-tables/billing.html
https://docs.databricks.com/en/admin/system-tables/compute.html


APPROACH 2: DATABRICKS PYTHON SDK (ACCOUNT-LEVEL API)

If you need to pull this information programmatically outside of a notebook, or need to combine it with Azure/AWS subscription metadata, the Databricks SDK for Python provides an AccountClient that can list workspaces across your account, and a WorkspaceClient that can list clusters per workspace.

Step 1: Install the SDK

pip install databricks-sdk

Step 2: List all workspaces with AccountClient

from databricks.sdk import AccountClient

# Authenticate with account-level credentials
# Set DATABRICKS_ACCOUNT_ID, DATABRICKS_HOST (e.g., https://accounts.cloud.databricks.com
# or https://accounts.azuredatabricks.net), and credentials (OAuth, PAT, etc.)
a = AccountClient()

for ws in a.workspaces.list():
print(f"Workspace: {ws.workspace_name}, ID: {ws.workspace_id}, Status: {ws.workspace_status}")

Step 3: Iterate over workspaces and list clusters with runtime versions

from databricks.sdk import AccountClient, WorkspaceClient
import os

a = AccountClient()

results = []
for ws in a.workspaces.list():
try:
# Create a WorkspaceClient for each workspace
w = WorkspaceClient(
host=f"https://{ws.deployment_name}.cloud.databricks.com"
# For Azure, use the workspace URL directly
# Authenticate per workspace (OAuth, PAT, or service principal)
)
for cluster in w.clusters.list():
results.append({
"workspace_name": ws.workspace_name,
"workspace_id": ws.workspace_id,
"cluster_name": cluster.cluster_name,
"cluster_id": cluster.cluster_id,
"owner": cluster.creator_user_name,
"runtime_version": cluster.spark_version,
"state": cluster.state.value if cluster.state else None,
})
except Exception as e:
print(f"Could not connect to {ws.workspace_name}: {e}")

# Convert to a DataFrame for easy viewing
import pandas as pd
df = pd.DataFrame(results)
print(df.to_string())

Important notes on the SDK approach:
- AccountClient requires account-level authentication. For Azure, the host is https://accounts.azuredatabricks.net. For AWS, it is https://accounts.cloud.databricks.com.
- You need a service principal or user with account admin privileges.
- Notebook-native authentication does not work with AccountClient, so you must explicitly set credentials via environment variables, a .databrickscfg profile, or constructor arguments.
- For cross-cloud scenarios (both Azure and AWS), you would need separate AccountClient instances, one per account/cloud.

Documentation references:
https://docs.databricks.com/en/dev-tools/sdk-python.html
https://docs.databricks.com/en/dev-tools/auth/index.html


BONUS: BILLING USAGE FOR WORKLOAD IDENTIFICATION

If you also want to understand which workloads are consuming resources (not just what clusters exist), the system.billing.usage table is very helpful:

SELECT
w.workspace_name,
u.usage_metadata.cluster_id,
u.billing_origin_product,
u.sku_name,
u.identity_metadata.run_as AS run_as_user,
SUM(u.usage_quantity) AS total_dbus,
MIN(u.usage_date) AS first_usage,
MAX(u.usage_date) AS last_usage
FROM system.billing.usage u
JOIN system.access.workspaces_latest w
ON u.workspace_id = w.workspace_id
WHERE u.usage_date >= CURRENT_DATE - INTERVAL 30 DAYS
GROUP BY ALL
ORDER BY total_dbus DESC

This shows which workspaces, clusters, and users are consuming the most DBUs, broken down by product type (JOBS, SQL, etc.).

I would recommend starting with the system tables approach since it requires no additional setup beyond Unity Catalog and gives you a unified view across all workspaces and cloud providers under the same account.

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.