cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to Programmatically Retrieve Cluster Memory Usage?

Akuhei05
New Contributor II

Hi!

I need help with the following:

  1. Programmatically retrieve the maximum memory configured for the cluster attached to the notebook/job - I think this is achievable through the system tables or Clusters API, but I'm open to other suggestions
  2. Execute a job on this cluster and, upon its completion, determine the amount of memory utilized during the job and get this information programmatically inside a simple notebook - Note: GangliaUI is out of question, we are using LTS 13.3. We also have a Spark-based Listener implemented, the logs are ingested to ADX. However, I haven't found a metric like this.

Could you provide guidance so that I can create a Delta table that includes these statistics?

Thank you!

3 REPLIES 3

anardinelli
Databricks Employee
Databricks Employee

Hi @Akuhei05 how are you?

For the first topic, you can create a cell on your notebook that gets the spark configuration for max memory every time it is ran, relating to your cluster attached. For this, please see below:

spark_memory = spark.sparkContext.getConf().get('spark.executor.memory')
print(spark_memory)

 For the second point, when you say "determine the amount of memory utilized during the job" is it related to the maximum used in total? per worker? is it the sum of it?

Best,

Alessandro

Hi Alessandro,

Thank you for your help and suggestion! 

For the second point, I’m looking to analyze the memory utilization over the duration of the job. Specifically, I want to know the average & total memory used during a single job run compared to the total memory available in that specific cluster - set by prior configuration. However, any additional useful metrics (like per worker) that I can access in the notebook would also be appreciated.  

I'm thinking of creating a Delta table to save these statistics to. I'd like to run performance tests for specific use cases and want to see how certain metrics change with different types of clusters used for a certain amounts of records to have a baseline. Later, we plan to find a way to integrate this into our CI/CD pipeline to optionally track how much our changes could affect the baseline performance on an "approximate" level.

anardinelli
Databricks Employee
Databricks Employee

Great use case!

Have you ever heard about Prometheus with Spark 3.0? Its a tool that can export live metrics for your jobs and runs which writes to a sink where you can read with a stream. I've personally never used in such use case, but there you can monitor every metric and write it out to then create some insights of it (such as averages and totals) on a different pipeline, which then can finally become a table.

To better understand, you can check these links below:

1. Session on how to use and enable Prometheus in databricks: https://www.youtube.com/watch?v=FDzm3MiSfiE

2. Spark official guide: https://spark.apache.org/docs/3.1.1/monitoring.html

Best,

Alessandro

 

All production environment requires monitoring and alerting. Apache Spark also has a configurable metrics system in order to allow users to report Spark metrics to a variety of sinks. Prometheus is one of the popular open-source monitoring and alerting toolkits which is used with Apache Spark ...

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group