cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Warehouse Name in System Tables

valjas
New Contributor III

Hello.

I am creating a table to monitor the usage of All-purpose Compute and SQL Warehouses. From the tables in 'system' catalog, I can get cluster_name and cluster_id. However only warehouse_id is available and not warehouse name. Is there a way to get the name of the warehouse in the system tables?

I thought of creating a table and having all the warehouse ids and names updated here. But, it would mean, this table has to be updated everytime a new warehouse is created.

Is there a place where databricks stores warehouse names along with ids? If not, please suggest a work around to automatially log all the warehouses that are created. 

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @valjasTo monitor and manage SQL warehouses in your Databricks workspace, you can utilize the warehouse events system table. This table records events related to warehouse activity, including when a warehouse starts, stops, scales up, or scales down. It’s a valuable resource for tracking changes to your warehouses.

Here are the key details about the warehouse events system table:

  1. Logged Warehouse Event Types:

    • SCALED_UP: A new cluster was added to the warehouse.
    • SCALED_DOWN: A cluster was removed from the warehouse.
    • STOPPING: The warehouse is in the process of stopping.
    • RUNNING: The warehouse is actively running.
    • STARTING: The warehouse is in the process of starting up.
    • STOPPED: The warehouse has completely stopped running.
  2. Warehouse Events Schema:

    • account_id: The ID of your Databricks account.
    • workspace_id: The ID of the workspace where the warehouse is deployed.
    • warehouse_id: The ID of the SQL warehouse associated with the event.
    • event_type: The type of warehouse event (e.g., SCALED_UP, SCALED_DOWN, RUNNING).
    • cluster_count: The number of clusters actively running.
    • event_time: Timestamp of when the event occurred.
  3. Sample Queries:

    • To identify actively running warehouses and their running time in hours:
      USE CATALOG `system`;
      SELECT
          we.warehouse_id,
          we.event_time,
          TIMESTAMPDIFF(MINUTE, we.event_time, CURRENT_TIMESTAMP()) / 60.0 AS running_hours,
          we.cluster_count
      FROM compute.warehouse_events we
      WHERE we.event_type = 'RUNNING'
          AND NOT EXISTS (
              SELECT 1
              FROM compute.warehouse_events we2
              WHERE we2.warehouse_id = we.warehouse_id
                  AND we2.event_time > we.event_time
          );
      
      • You can set an alert condition based on the query results to be notified if a warehouse runs longer than expected.
  4. Automating Warehouse Logging:

    • While Databricks doesn’t directly store warehouse names alongside IDs, you can create a custom solution to automatically log all warehouses created.
    • Here’s a workaround:
      • Set up a Databricks job or notebook that periodically queries the system tables to retrieve warehouse information (including names and IDs).
      • Store this information in a separate table (e.g., a dedicated “Warehouse Metadata” table) within your Databricks workspace.
      • Use triggers or scheduled runs to keep this table updated whenever new warehouses are created or existing ones are modified.

If you have any further questions or need additional assistance, feel free to ask! 🚀🔍

For more details, refer to the Databricks documentation on warehouse events.

 
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.