cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Is there any way to monitor the CPU, disk and memory usage of a cluster while a job is running?

SaravananPalani
New Contributor II

I am looking for something preferably similar to Windows task manager which we can use for monitoring the CPU, memory and disk usage for local desktop.

8 REPLIES 8

ThomasKastl
Contributor

I would also find this really really useful.

User16301467513
New Contributor II

Spark UI can give you access to some of this information, just not in real-time. It's also intended for Spark-specific performance information such as job and task breakdowns.

Ganglia metrics can give you real-time metrics along these lines both in real-time and historically.

In the Clusters page for your particular cluster, select the "Metrics" link and you'll have access to the "Ganglia UI" link (for real-time) and the historical snapshots list.screen-shot-2019-05-30-at-40457-pm.png

You can find out more at the Metrics documentation page:

https://docs.databricks.com/user-guide/clusters/metrics.html

Anonymous
Not applicable

Ganglia metric are not that much helpful and also with cluster start you lose old data .

Question is how to get live metrics and view historical data .

OMS agent are best in that case. i used in Azure databricks and its wonderful .

should be doable in AWS as well with some modification.

Meghala
Valued Contributor II

Which is real real time matrics​

Pelicanine
New Contributor II

Ganglia metrics can give you real-time metrics along these lines both in real-time and historically. mcdvoice

youssefmrini
Honored Contributor III
Honored Contributor III

You can use the Ganglia UI to track the CPU, Network, Disk, and Memory. Keep in mind that Ganglia UI in a snapshot displayed every 15 minutes

Rajeev_Basu
Contributor III

as mentioned by few - Ganglia UI can be used to track it. we use the same in our projects.

hitech88
New Contributor II

Some important info to look in Gangalia UI in CPU, memory and server load charts to spot the problem:

CPU chart :

  • User %
  • Idle %

High percentage of user % indicates heavy CPU usage in the cluster.

Memory chart :

  • Use %
  • Free %
  • Swap %

If you see purple line over red line in memory chart then it indicates memory swapping and also highlighting high memory usage.

Server Load Distribution Chart:

Absence of red squares indicates balanced load on the cluster. Presence of red squares means there is hot spot where load is more.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.