API monitoring of All-purpose clusters

MVP Articles

This page brings together externally published articles written by our MVPs. Discover expert perspectives, real-world guidance, and community contributions from leaders across the ecosystem.

A lot of Databricks spend isn’t “compute” at all — it’s paid idle time on all‑purpose clusters while they sit around waiting for Auto Termination. Databricks UI is great at showing starting/running/terminating, but it often hides the key operational question:

Is this cluster actually doing work right now, or just burning time until shutdown?
Which scheduled jobs are running on an all‑purpose cluster (and when)?

A simple case from my article:

The job finishes in 6m 12s
The cluster then stays up for ~30 more minutes due to the termination timeout
You pay for ~36 minutes total, where ~30 minutes is pure idle, and this is easiest to miss during off-hours/night runs.

With the same assumptions, my numbers showed a job cluster can be up to 12.5× cheaper, largely because it avoids that expensive “waiting window”.

I wrote up the approach and built a more visual monitoring view to spot these leaks fast and fix them via settings or by choosing the right cluster type.