How to fetch max or avg worker nodes for a specific time period using system tables?
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-21-2025 02:58 AM
3 REPLIES 3
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-21-2025 04:30 AM
Hi @pranitha,
You can utilize the node_timeline
table. https://docs.databricks.com/en/admin/system-tables/compute.html
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-21-2025 05:22 AM - edited 01-21-2025 05:25 AM
try this
SELECT
cluster_id,
MAX(max_autoscale_workers) AS max_workers,
DATE_TRUNC('day', change_time) AS day
FROM
system.compute.clusters
WHERE
change_time BETWEEN '2023-01-01' AND '2023-01-31'
GROUP BY
cluster_id,
DATE_TRUNC('day', change_time)
ORDER BY
day;
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-21-2025 06:45 AM
Hi @pranitha
Use this query to get the cluster details along with cost info as well.
WITH hourly_metrics AS (
SELECT
date_trunc('hour', usage_start_time) as hour,
usage_metadata.cluster_id,
sku_name,
MAX(usage_quantity) as max_usage,
ROUND(AVG(usage_quantity), 2) as avg_usage,
SUM(usage_quantity) as DBUs_consumed
FROM system.billing.usage
WHERE usage_start_time BETWEEN '2025-01-01' AND '2025-01-21'
GROUP BY
date_trunc('hour', usage_start_time),
usage_metadata.cluster_id,
sku_name
),
cost_calculation AS (
SELECT
h.*,
l.pricing.default AS unitprice,
ROUND((h.DBUs_consumed * l.pricing.default)) AS total_cost
FROM hourly_metrics h
LEFT OUTER JOIN system.billing.list_prices l ON l.sku_name = h.sku_name
)
SELECT
cluster_id,
sku_name,
hour,
max_usage,
avg_usage,
DBUs_consumed,
unitprice,
total_cost
FROM cost_calculation
ORDER BY hour DESC, total_cost DESC;

