<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Identifying workload in azure and AWS in Administration &amp; Architecture</title>
    <link>https://community.databricks.com/t5/administration-architecture/identifying-workload-in-azure-and-aws/m-p/146689#M4794</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/201130"&gt;@anshu_roy&lt;/a&gt;&amp;nbsp;will try this and let you know if this works thank you for the suggestion&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 03 Feb 2026 08:29:57 GMT</pubDate>
    <dc:creator>Saurabh_kanoje</dc:creator>
    <dc:date>2026-02-03T08:29:57Z</dc:date>
    <item>
      <title>Identifying workload in azure and AWS</title>
      <link>https://community.databricks.com/t5/administration-architecture/identifying-workload-in-azure-and-aws/m-p/145533#M4771</link>
      <description>&lt;P&gt;&lt;SPAN&gt;we are looking for some python codes that can helps us, we need to have an overview of all Databricks workspaces, their owner names, and mainly the runtime versions that they use, in every Azure and AWS subscriptions that we manage.&amp;nbsp;&lt;BR /&gt;Can someone please suggest actions here&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 28 Jan 2026 13:42:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/identifying-workload-in-azure-and-aws/m-p/145533#M4771</guid>
      <dc:creator>Saurabh_kanoje</dc:creator>
      <dc:date>2026-01-28T13:42:06Z</dc:date>
    </item>
    <item>
      <title>Re: Identifying workload in azure and AWS</title>
      <link>https://community.databricks.com/t5/administration-architecture/identifying-workload-in-azure-and-aws/m-p/146593#M4781</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;You can use&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://docs.databricks.com/aws/en/admin/system-tables/workspaces" target="_self"&gt;&lt;SPAN class="inline-flex" aria-label="Workspaces system table reference | Databricks on AWS" data-state="closed"&gt;&lt;SPAN class="text-box-trim-both"&gt;Workspace System tables&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/A&gt;to get an overview of each workspace in your Databricks account for both Azure and AWS and combine them to build a global view.&lt;SPAN class="inline-flex" aria-label="Workspaces system table reference | Databricks on AWS" data-state="closed"&gt;​&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;These tables let you query key metadata such as workspace ID, name, URL, cloud, and lifecycle status from the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;system.access.workspaces_latest&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;table.&lt;SPAN class="inline-flex" aria-label="Workspaces system table reference | Databricks on AWS" data-state="closed"&gt;​&lt;/SPAN&gt;&lt;BR /&gt;You can then join this with other system tables like&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;system.billing.usage&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;or compute/system tables to enrich your overview with usage, cost, and compute details per workspace.&lt;SPAN class="inline-flex" aria-label="Workspaces system table reference | Databricks on AWS" data-state="closed"&gt;​&lt;/SPAN&gt;&lt;BR /&gt;By running these queries in a central account-level warehouse, you can regularly export the results (for example,&amp;nbsp; to a dashboard) and use them as a single inventory of all workspaces across clouds.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 02 Feb 2026 16:09:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/identifying-workload-in-azure-and-aws/m-p/146593#M4781</guid>
      <dc:creator>anshu_roy</dc:creator>
      <dc:date>2026-02-02T16:09:33Z</dc:date>
    </item>
    <item>
      <title>Re: Identifying workload in azure and AWS</title>
      <link>https://community.databricks.com/t5/administration-architecture/identifying-workload-in-azure-and-aws/m-p/146689#M4794</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/201130"&gt;@anshu_roy&lt;/a&gt;&amp;nbsp;will try this and let you know if this works thank you for the suggestion&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 03 Feb 2026 08:29:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/identifying-workload-in-azure-and-aws/m-p/146689#M4794</guid>
      <dc:creator>Saurabh_kanoje</dc:creator>
      <dc:date>2026-02-03T08:29:57Z</dc:date>
    </item>
    <item>
      <title>Re: Identifying workload in azure and AWS</title>
      <link>https://community.databricks.com/t5/administration-architecture/identifying-workload-in-azure-and-aws/m-p/150175#M4977</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/196498"&gt;@Saurabh_kanoje&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;There are two complementary approaches to get an overview of all your Databricks workspaces, their owners, and the runtime versions in use across Azure and AWS. I will walk through both.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;APPROACH 1: SYSTEM TABLES (RECOMMENDED, NO EXTERNAL SDK REQUIRED)&lt;/P&gt;
&lt;P&gt;If your workspaces are all under the same Databricks account and Unity Catalog is enabled, system tables give you a single-pane view from any workspace in the account. No additional libraries are needed, just SQL or PySpark.&lt;/P&gt;
&lt;P&gt;Step 1: List all workspaces&lt;/P&gt;
&lt;P&gt;The system.access.workspaces_latest table contains metadata for every active workspace in your account:&lt;/P&gt;
&lt;P&gt;SELECT&lt;BR /&gt;workspace_id,&lt;BR /&gt;workspace_name,&lt;BR /&gt;workspace_url,&lt;BR /&gt;status,&lt;BR /&gt;create_time&lt;BR /&gt;FROM system.access.workspaces_latest&lt;BR /&gt;WHERE status = 'RUNNING'&lt;BR /&gt;ORDER BY workspace_name&lt;/P&gt;
&lt;P&gt;Step 2: Get cluster owners and runtime versions per workspace&lt;/P&gt;
&lt;P&gt;The system.compute.clusters table tracks every cluster configuration, including who owns it and which Databricks Runtime it runs:&lt;/P&gt;
&lt;P&gt;SELECT&lt;BR /&gt;w.workspace_name,&lt;BR /&gt;c.workspace_id,&lt;BR /&gt;c.cluster_id,&lt;BR /&gt;c.cluster_name,&lt;BR /&gt;c.owned_by,&lt;BR /&gt;c.dbr_version,&lt;BR /&gt;c.driver_node_type,&lt;BR /&gt;c.worker_node_type,&lt;BR /&gt;c.cluster_source,&lt;BR /&gt;c.create_time,&lt;BR /&gt;c.delete_time&lt;BR /&gt;FROM system.compute.clusters c&lt;BR /&gt;JOIN system.access.workspaces_latest w&lt;BR /&gt;ON c.workspace_id = w.workspace_id&lt;BR /&gt;WHERE c.delete_time IS NULL&lt;BR /&gt;ORDER BY w.workspace_name, c.cluster_name&lt;/P&gt;
&lt;P&gt;This gives you a cross-workspace inventory of every active cluster, its owner, and runtime version, all from a single query.&lt;/P&gt;
&lt;P&gt;Step 3: Runtime version summary across workspaces&lt;/P&gt;
&lt;P&gt;To get a summary of which runtime versions are in use per workspace:&lt;/P&gt;
&lt;P&gt;SELECT&lt;BR /&gt;w.workspace_name,&lt;BR /&gt;c.dbr_version,&lt;BR /&gt;COUNT(*) AS cluster_count&lt;BR /&gt;FROM system.compute.clusters c&lt;BR /&gt;JOIN system.access.workspaces_latest w&lt;BR /&gt;ON c.workspace_id = w.workspace_id&lt;BR /&gt;WHERE c.delete_time IS NULL&lt;BR /&gt;GROUP BY w.workspace_name, c.dbr_version&lt;BR /&gt;ORDER BY w.workspace_name, c.dbr_version&lt;/P&gt;
&lt;P&gt;Note: System tables require Unity Catalog to be enabled and you need account admin privileges (or explicit GRANT of USE and SELECT on the system schema). The workspaces_latest table only contains currently active workspaces; cancelled workspaces are removed.&lt;/P&gt;
&lt;P&gt;Documentation references:&lt;BR /&gt;&lt;A href="https://docs.databricks.com/en/admin/system-tables/index.html" target="_blank"&gt;https://docs.databricks.com/en/admin/system-tables/index.html&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://docs.databricks.com/en/admin/system-tables/billing.html" target="_blank"&gt;https://docs.databricks.com/en/admin/system-tables/billing.html&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://docs.databricks.com/en/admin/system-tables/compute.html" target="_blank"&gt;https://docs.databricks.com/en/admin/system-tables/compute.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;APPROACH 2: DATABRICKS PYTHON SDK (ACCOUNT-LEVEL API)&lt;/P&gt;
&lt;P&gt;If you need to pull this information programmatically outside of a notebook, or need to combine it with Azure/AWS subscription metadata, the Databricks SDK for Python provides an AccountClient that can list workspaces across your account, and a WorkspaceClient that can list clusters per workspace.&lt;/P&gt;
&lt;P&gt;Step 1: Install the SDK&lt;/P&gt;
&lt;P&gt;pip install databricks-sdk&lt;/P&gt;
&lt;P&gt;Step 2: List all workspaces with AccountClient&lt;/P&gt;
&lt;P&gt;from databricks.sdk import AccountClient&lt;/P&gt;
&lt;P&gt;# Authenticate with account-level credentials&lt;BR /&gt;# Set DATABRICKS_ACCOUNT_ID, DATABRICKS_HOST (e.g., &lt;A href="https://accounts.cloud.databricks.com" target="_blank"&gt;https://accounts.cloud.databricks.com&lt;/A&gt;&lt;BR /&gt;# or &lt;A href="https://accounts.azuredatabricks.net" target="_blank"&gt;https://accounts.azuredatabricks.net&lt;/A&gt;), and credentials (OAuth, PAT, etc.)&lt;BR /&gt;a = AccountClient()&lt;/P&gt;
&lt;P&gt;for ws in a.workspaces.list():&lt;BR /&gt;print(f"Workspace: {ws.workspace_name}, ID: {ws.workspace_id}, Status: {ws.workspace_status}")&lt;/P&gt;
&lt;P&gt;Step 3: Iterate over workspaces and list clusters with runtime versions&lt;/P&gt;
&lt;P&gt;from databricks.sdk import AccountClient, WorkspaceClient&lt;BR /&gt;import os&lt;/P&gt;
&lt;P&gt;a = AccountClient()&lt;/P&gt;
&lt;P&gt;results = []&lt;BR /&gt;for ws in a.workspaces.list():&lt;BR /&gt;try:&lt;BR /&gt;# Create a WorkspaceClient for each workspace&lt;BR /&gt;w = WorkspaceClient(&lt;BR /&gt;host=f"https://{ws.deployment_name}.cloud.databricks.com"&lt;BR /&gt;# For Azure, use the workspace URL directly&lt;BR /&gt;# Authenticate per workspace (OAuth, PAT, or service principal)&lt;BR /&gt;)&lt;BR /&gt;for cluster in w.clusters.list():&lt;BR /&gt;results.append({&lt;BR /&gt;"workspace_name": ws.workspace_name,&lt;BR /&gt;"workspace_id": ws.workspace_id,&lt;BR /&gt;"cluster_name": cluster.cluster_name,&lt;BR /&gt;"cluster_id": cluster.cluster_id,&lt;BR /&gt;"owner": cluster.creator_user_name,&lt;BR /&gt;"runtime_version": cluster.spark_version,&lt;BR /&gt;"state": cluster.state.value if cluster.state else None,&lt;BR /&gt;})&lt;BR /&gt;except Exception as e:&lt;BR /&gt;print(f"Could not connect to {ws.workspace_name}: {e}")&lt;/P&gt;
&lt;P&gt;# Convert to a DataFrame for easy viewing&lt;BR /&gt;import pandas as pd&lt;BR /&gt;df = pd.DataFrame(results)&lt;BR /&gt;print(df.to_string())&lt;/P&gt;
&lt;P&gt;Important notes on the SDK approach:&lt;BR /&gt;- AccountClient requires account-level authentication. For Azure, the host is &lt;A href="https://accounts.azuredatabricks.net" target="_blank"&gt;https://accounts.azuredatabricks.net&lt;/A&gt;. For AWS, it is &lt;A href="https://accounts.cloud.databricks.com" target="_blank"&gt;https://accounts.cloud.databricks.com&lt;/A&gt;.&lt;BR /&gt;- You need a service principal or user with account admin privileges.&lt;BR /&gt;- Notebook-native authentication does not work with AccountClient, so you must explicitly set credentials via environment variables, a .databrickscfg profile, or constructor arguments.&lt;BR /&gt;- For cross-cloud scenarios (both Azure and AWS), you would need separate AccountClient instances, one per account/cloud.&lt;/P&gt;
&lt;P&gt;Documentation references:&lt;BR /&gt;&lt;A href="https://docs.databricks.com/en/dev-tools/sdk-python.html" target="_blank"&gt;https://docs.databricks.com/en/dev-tools/sdk-python.html&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://docs.databricks.com/en/dev-tools/auth/index.html" target="_blank"&gt;https://docs.databricks.com/en/dev-tools/auth/index.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;BONUS: BILLING USAGE FOR WORKLOAD IDENTIFICATION&lt;/P&gt;
&lt;P&gt;If you also want to understand which workloads are consuming resources (not just what clusters exist), the system.billing.usage table is very helpful:&lt;/P&gt;
&lt;P&gt;SELECT&lt;BR /&gt;w.workspace_name,&lt;BR /&gt;u.usage_metadata.cluster_id,&lt;BR /&gt;u.billing_origin_product,&lt;BR /&gt;u.sku_name,&lt;BR /&gt;u.identity_metadata.run_as AS run_as_user,&lt;BR /&gt;SUM(u.usage_quantity) AS total_dbus,&lt;BR /&gt;MIN(u.usage_date) AS first_usage,&lt;BR /&gt;MAX(u.usage_date) AS last_usage&lt;BR /&gt;FROM system.billing.usage u&lt;BR /&gt;JOIN system.access.workspaces_latest w&lt;BR /&gt;ON u.workspace_id = w.workspace_id&lt;BR /&gt;WHERE u.usage_date &amp;gt;= CURRENT_DATE - INTERVAL 30 DAYS&lt;BR /&gt;GROUP BY ALL&lt;BR /&gt;ORDER BY total_dbus DESC&lt;/P&gt;
&lt;P&gt;This shows which workspaces, clusters, and users are consuming the most DBUs, broken down by product type (JOBS, SQL, etc.).&lt;/P&gt;
&lt;P&gt;I would recommend starting with the system tables approach since it requires no additional setup beyond Unity Catalog and gives you a unified view across all workspaces and cloud providers under the same account.&lt;/P&gt;
&lt;P&gt;* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.&lt;/P&gt;</description>
      <pubDate>Sun, 08 Mar 2026 07:27:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/identifying-workload-in-azure-and-aws/m-p/150175#M4977</guid>
      <dc:creator>SteveOstrowski</dc:creator>
      <dc:date>2026-03-08T07:27:48Z</dc:date>
    </item>
  </channel>
</rss>

