Databricks Community

drumcircle · ‎01-03-2025

I'm trying to optimize machine selection (D, E, or L types on Azure) for job clusters and all-purpose compute and am struggling to identify where performance is sagging on account of disk spill. Disk spill would suggest that more memory is needed. I can get there from the Spark UI but am looking for historical diagnostics.

As of January 2025, system.compute.node_timeline is telling me useful things but not spill, explicitly.

https://docs.databricks.com/en/admin/system-tables/compute.html#node-timeline-table-schema

Help appreciated.

Walter_C · ‎01-03-2025

For historical diagnostics, you might need to consider setting up a custom logging mechanism that captures these metrics over time and stores them in a persistent storage solution, such as a database or a logging service. This way, you can query and analyze historical performance data, including disk spill, at any point in the future.

Databricks Community

Determining spill from system tables

Join Us as a Local Community Builder!

🚀 Weekly Delta (8 - 14 October): A Look Back at This Week’s Top Community Highlights

Databricks Community Champion - September 2025 - Nayanjyoti Sonowal

BrickCon 2025 — Dec 3–5 | A Community Conference for Databricks Builders

🌟 Community Sparks of the Week | September 26 – October 2 🌟

Solution Accelerator Series | #4 - Toxicity Detection for Gaming