cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to Programmatically Retrieve Cluster Memory Usage?

Akuhei05
New Contributor II

Hi!

I need help with the following:

  1. Programmatically retrieve the maximum memory configured for the cluster attached to the notebook/job - I think this is achievable through the system tables or Clusters API, but I'm open to other suggestions
  2. Execute a job on this cluster and, upon its completion, determine the amount of memory utilized during the job and get this information programmatically inside a simple notebook - Note: GangliaUI is out of question, we are using LTS 13.3. We also have a Spark-based Listener implemented, the logs are ingested to ADX. However, I haven't found a metric like this.

Could you provide guidance so that I can create a Delta table that includes these statistics?

Thank you!

3 REPLIES 3

anardinelli
New Contributor III
New Contributor III

Hi @Akuhei05 how are you?

For the first topic, you can create a cell on your notebook that gets the spark configuration for max memory every time it is ran, relating to your cluster attached. For this, please see below:

spark_memory = spark.sparkContext.getConf().get('spark.executor.memory')
print(spark_memory)

 For the second point, when you say "determine the amount of memory utilized during the job" is it related to the maximum used in total? per worker? is it the sum of it?

Best,

Alessandro

Hi Alessandro,

Thank you for your help and suggestion! 

For the second point, Iโ€™m looking to analyze the memory utilization over the duration of the job. Specifically, I want to know the average & total memory used during a single job run compared to the total memory available in that specific cluster - set by prior configuration. However, any additional useful metrics (like per worker) that I can access in the notebook would also be appreciated.  

I'm thinking of creating a Delta table to save these statistics to. I'd like to run performance tests for specific use cases and want to see how certain metrics change with different types of clusters used for a certain amounts of records to have a baseline. Later, we plan to find a way to integrate this into our CI/CD pipeline to optionally track how much our changes could affect the baseline performance on an "approximate" level.

anardinelli
New Contributor III
New Contributor III

Great use case!

Have you ever heard about Prometheus with Spark 3.0? Its a tool that can export live metrics for your jobs and runs which writes to a sink where you can read with a stream. I've personally never used in such use case, but there you can monitor every metric and write it out to then create some insights of it (such as averages and totals) on a different pipeline, which then can finally become a table.

To better understand, you can check these links below:

1. Session on how to use and enable Prometheus in databricks: https://www.youtube.com/watch?v=FDzm3MiSfiE

2. Spark official guide: https://spark.apache.org/docs/3.1.1/monitoring.html

Best,

Alessandro

 

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!