cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Understanding High I/O Wait Despite High CPU Utilization in system.compute Metrics

saicharandeepb
New Contributor

Hi everyone,

I'm working on building a hardware metrics dashboard using the system.compute schema in Databricks, specifically leveraging the cluster, node_type, and node_timeline tables.

While analyzing the data, I came across something that seems contradictory to common industry guidance:

It's generally accepted that if I/O wait exceeds 10%, it indicates CPU performance degradation due to the processor waiting on disk or network I/O.

However, in several cases from my data, I noticed that even when cpu_wait_percent is greater than 10%, the cpu_user_percent is still above 90% — which suggests the CPU is actively doing useful work.

This seems counterintuitive. Shouldn't high I/O wait reduce the CPU's ability to perform user-level tasks?

Has anyone else observed this behavior in Databricks or Spark environments? Could this be due to how metrics are sampled or reported in the system tables? Or is it possible that multiple cores are being utilized in parallel, masking the impact of I/O wait?

Any insights or explanations would be greatly appreciated!

Thanks in advance!

0 REPLIES 0

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now