cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Is there any way to monitor the CPU, disk and memory usage of a cluster while a job is running?

SaravananPalani
New Contributor II

I am looking for something preferably similar to Windows task manager which we can use for monitoring the CPU, memory and disk usage for local desktop.

8 REPLIES 8

ThomasKastl
Contributor

I would also find this really really useful.

User16301467513
New Contributor II

Spark UI can give you access to some of this information, just not in real-time. It's also intended for Spark-specific performance information such as job and task breakdowns.

Ganglia metrics can give you real-time metrics along these lines both in real-time and historically.

In the Clusters page for your particular cluster, select the "Metrics" link and you'll have access to the "Ganglia UI" link (for real-time) and the historical snapshots list.screen-shot-2019-05-30-at-40457-pm.png

You can find out more at the Metrics documentation page:

https://docs.databricks.com/user-guide/clusters/metrics.html

Anonymous
Not applicable

Ganglia metric are not that much helpful and also with cluster start you lose old data .

Question is how to get live metrics and view historical data .

OMS agent are best in that case. i used in Azure databricks and its wonderful .

should be doable in AWS as well with some modification.

Meghala
Valued Contributor II

Which is real real time matricsโ€‹

Pelicanine
New Contributor II

Ganglia metrics can give you real-time metrics along these lines both in real-time and historically. mcdvoice

youssefmrini
Databricks Employee
Databricks Employee

You can use the Ganglia UI to track the CPU, Network, Disk, and Memory. Keep in mind that Ganglia UI in a snapshot displayed every 15 minutes

Rajeev_Basu
Contributor III

as mentioned by few - Ganglia UI can be used to track it. we use the same in our projects.

hitech88
New Contributor II

Some important info to look in Gangalia UI in CPU, memory and server load charts to spot the problem:

CPU chart :

  • User %
  • Idle %

High percentage of user % indicates heavy CPU usage in the cluster.

Memory chart :

  • Use %
  • Free %
  • Swap %

If you see purple line over red line in memory chart then it indicates memory swapping and also highlighting high memory usage.

Server Load Distribution Chart:

Absence of red squares indicates balanced load on the cluster. Presence of red squares means there is hot spot where load is more.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group