High memory usage on Databricks cluster

dbuserng — Tue, 21 Jan 2025 09:06:58 GMT

In my team we have a very high memory usage even when the cluster has just been started and nothing has been run yet. Additionally, memory usage never drops to lower levels - total used memory always fluctuates around 14GB.

Where is this memory usage coming from? Is it possible to see more detailed information about what processes exactly are consuming our memory?
We are trying to figure out the most optimal way of allocating memory to our executors. Assuming that we have 14GB Memory available on each worker node and there is one executor per worker node - what should be the total memory available on executor? Currently our spark.executor.memory is set to 7GB, but we are wondering if we could increase it a bit, but we are not sure how much memory should be left for Databricks processes (looking at the chart a lot..)

I would appreciate your help!

Re: High memory usage on Databricks cluster

-werners- — Tue, 21 Jan 2025 09:27:28 GMT

This is not necessarily an issue. Linux uses a lot of RAM for caching but this does not mean it cannot be released for processes (dynamic memory mgmt).
Basically the philosophy is that RAM that is not used (so actually 'free') is useless.
Here is a reddit topic about linux and RAM. Might explain a little.

So try to run your spark app on the cluster as is. If you actually run into memory issues you will see a lot of spill to disk or OOM errors.

topic High memory usage on Databricks cluster in Administration & Architecture

High memory usage on Databricks cluster

Re: High memory usage on Databricks cluster