Databricks Community

Phani1 · ‎11-14-2024

How can we find out the usage details of the Databricks cluster? Specifically, we need to know how many nodes are in use, how long the cluster is idle, the time it takes to start up, and the jobs it is running along with their durations. Is there a query we can run on the system table to check if the cluster is over-provisioned or if it is using its configuration effectively? We also need similar information for serverless and SQL warehouse. Please provide any queries that can help us gather these details.

Walter_C · ‎11-15-2024

You can try the following query on your system tables:

WITH cluster_usage AS (
  SELECT 
    u.usage_metadata.cluster_id,
    u.usage_date,
    u.usage_start_time,
    u.usage_end_time,
    DATEDIFF(second, u.usage_start_time, u.usage_end_time) AS duration_seconds,
    u.usage_quantity AS dbus_used,
    CASE 
      WHEN u.usage_metadata.job_id IS NOT NULL THEN 'job'
      WHEN u.usage_metadata.notebook_id IS NOT NULL THEN 'notebook'
      ELSE 'interactive'
    END AS usage_type,
    u.usage_metadata.job_id,
    u.usage_metadata.notebook_id
  FROM system.billing.usage u
  WHERE u.usage_date >= DATEADD(day, -30, CURRENT_DATE)
    AND u.usage_metadata.cluster_id IS NOT NULL
)
SELECT 
  c.cluster_id,
  c.cluster_name,
  c.dbr_version,
  c.worker_count,
  c.auto_termination_minutes,
  cu.usage_date,
  SUM(cu.duration_seconds) AS total_duration_seconds,
  SUM(CASE WHEN cu.usage_type = 'interactive' THEN cu.duration_seconds ELSE 0 END) AS idle_seconds,
  SUM(cu.dbus_used) AS total_dbus_used,
  COUNT(DISTINCT CASE WHEN cu.usage_type = 'job' THEN cu.job_id END) AS unique_jobs_run,
  AVG(CASE WHEN cu.usage_type = 'job' THEN cu.duration_seconds END) AS avg_job_duration_seconds
FROM system.compute.clusters c
JOIN cluster_usage cu ON c.cluster_id = cu.cluster_id
GROUP BY 
  c.cluster_id, 
  c.cluster_name, 
  c.dbr_version, 
  c.worker_count, 
  c.auto_termination_minutes, 
  cu.usage_date
ORDER BY c.cluster_name, cu.usage_date

This query provides:

Cluster configuration details (cluster_id, cluster_name, Databricks Runtime version, worker count, autotermination settings)
Daily usage statistics
Idle time estimation (assuming interactive usage without a job or notebook ID is idle time)
Total DBUs used
Number of unique jobs run
Average job duration

To identify if a cluster is over-provisioned or not using its configuration effectively, look for patterns such as: