A lot of Databricks spend isn’t “compute” at all — it’s paid idle time on all‑purpose clusters while they sit around waiting for Auto Termination. Databricks UI is great at showing starting/running/terminating, but it often hides the key operational question:
- Is this cluster actually doing work right now, or just burning time until shutdown?
- Which scheduled jobs are running on an all‑purpose cluster (and when)?
A simple case from my article:
- The job finishes in 6m 12s
- The cluster then stays up for ~30 more minutes due to the termination timeout
- You pay for ~36 minutes total, where ~30 minutes is pure idle, and this is easiest to miss during off-hours/night runs.
With the same assumptions, my numbers showed a job cluster can be up to 12.5× cheaper, largely because it avoids that expensive “waiting window”.
I wrote up the approach and built a more visual monitoring view to spot these leaks fast and fix them via settings or by choosing the right cluster type.