cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to fetch max or avg worker nodes for a specific time period using system tables?

pranitha
New Contributor II
 
3 REPLIES 3

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @pranitha,

You can utilize the node_timeline table. https://docs.databricks.com/en/admin/system-tables/compute.html

saurabh18cs
Honored Contributor

try this 

SELECT
  cluster_id,
 MAX(max_autoscale_workers) AS max_workers,
  DATE_TRUNC('day', change_time) AS day
FROM
    system.compute.clusters
WHERE
  change_time BETWEEN '2023-01-01' AND '2023-01-31'
GROUP BY
  cluster_id,
  DATE_TRUNC('day', change_time)
ORDER BY
  day;

MadhuB
Contributor III

Hi @pranitha 

Use this query to get the cluster details along with cost info as well.

 

WITH hourly_metrics AS (
  SELECT
    date_trunc('hour', usage_start_time) as hour,
    usage_metadata.cluster_id,
    sku_name,
    MAX(usage_quantity) as max_usage,
    ROUND(AVG(usage_quantity), 2) as avg_usage,
    SUM(usage_quantity) as DBUs_consumed
  FROM system.billing.usage
  WHERE usage_start_time BETWEEN '2025-01-01' AND '2025-01-21'
  GROUP BY
    date_trunc('hour', usage_start_time),
    usage_metadata.cluster_id,
    sku_name
),
cost_calculation AS (
  SELECT
    h.*,
    l.pricing.default AS unitprice,
    ROUND((h.DBUs_consumed * l.pricing.default)) AS total_cost
  FROM hourly_metrics h
  LEFT OUTER JOIN system.billing.list_prices l ON l.sku_name = h.sku_name
)
SELECT
  cluster_id,
  sku_name,
  hour,
  max_usage,
  avg_usage,
  DBUs_consumed,
  unitprice,
  total_cost
FROM cost_calculation
ORDER BY hour DESC, total_cost DESC;