You can try the following query on your system tables:
WITH cluster_usage AS (
SELECT
u.usage_metadata.cluster_id,
u.usage_date,
u.usage_start_time,
u.usage_end_time,
DATEDIFF(second, u.usage_start_time, u.usage_end_time) AS duration_seconds,
u.usage_quantity AS dbus_used,
CASE
WHEN u.usage_metadata.job_id IS NOT NULL THEN 'job'
WHEN u.usage_metadata.notebook_id IS NOT NULL THEN 'notebook'
ELSE 'interactive'
END AS usage_type,
u.usage_metadata.job_id,
u.usage_metadata.notebook_id
FROM system.billing.usage u
WHERE u.usage_date >= DATEADD(day, -30, CURRENT_DATE)
AND u.usage_metadata.cluster_id IS NOT NULL
)
SELECT
c.cluster_id,
c.cluster_name,
c.dbr_version,
c.worker_count,
c.auto_termination_minutes,
cu.usage_date,
SUM(cu.duration_seconds) AS total_duration_seconds,
SUM(CASE WHEN cu.usage_type = 'interactive' THEN cu.duration_seconds ELSE 0 END) AS idle_seconds,
SUM(cu.dbus_used) AS total_dbus_used,
COUNT(DISTINCT CASE WHEN cu.usage_type = 'job' THEN cu.job_id END) AS unique_jobs_run,
AVG(CASE WHEN cu.usage_type = 'job' THEN cu.duration_seconds END) AS avg_job_duration_seconds
FROM system.compute.clusters c
JOIN cluster_usage cu ON c.cluster_id = cu.cluster_id
GROUP BY
c.cluster_id,
c.cluster_name,
c.dbr_version,
c.worker_count,
c.auto_termination_minutes,
cu.usage_date
ORDER BY c.cluster_name, cu.usage_date
This query provides:
- Cluster configuration details (cluster_id, cluster_name, Databricks Runtime version, worker count, autotermination settings)
- Daily usage statistics
- Idle time estimation (assuming interactive usage without a job or notebook ID is idle time)
- Total DBUs used
- Number of unique jobs run
- Average job duration
To identify if a cluster is over-provisioned or not using its configuration effectively, look for patterns such as:
- High idle time relative to total duration
- Low DBU usage compared to the worker count
- Few jobs running on clusters with many workers
- Long average job durations on clusters with many workers