restarting the cluster always running doesn't free the memory?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-04-2025 05:53 AM
Hello community,
I was working on optimising the driver memory, since there are code that are not optimised for spark, and I was planning temporary to restart the cluster to free up the memory.
that could be a potential solution, since if the cluster is not working in the first few minutes of each hour it is a good moment to restart it and free up the memory. But I was looking about the standard output and seems that is not free any memory. Why this behaviour? I need to terminate and start the cluster instead of this previous operation?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-04-2025 06:36 AM
Hi @jeremy98,
Generally, Databricks recommends regularly restarting clusters, particularly interactive ones, for regular clean-up. Restarting or terminating and starting the cluster anew ensures stopping all processes and freeing up memory effectively, therefore the restart should actually clean it up. You can see in your cluster metrics once it is restarted.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-04-2025 06:47 AM - edited 03-04-2025 06:48 AM
Thanks Alberto, for the clarification! Yes, it is true, effectively, the metric UI doubled the logs for the driver and for the worker/s. I think it is a normal behaviour.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-04-2025 06:53 AM
No problem, happy to assist!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-04-2025 07:14 AM
Hi, I have another question: Usually, the driver should free memory by itself, but is it possible that the driver fails to do so? Why does this happen, and what issues can arise from this behavior?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-04-2025 07:22 AM
Yes, it is actually possible that driver due to some reason did not free up memory.. if that happens the you will see these kind of failures:
- Unresponsiveness: The driver may become unresponsive, leading to failed health checks and potential restarts or kills by watchdog mechanisms.
- Frequent Restarts: Continuous memory pressure and GC overhead can cause the driver to restart frequently, leading to interruptions in job executions and degraded performance.
- Out of Memory (OOM) Conditions: Eventually, the driver might run out of memory, leading to crashes and job failures with explicit OOM errors
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-04-2025 07:26 AM
Exactly, thanks, Alberto! But in general, is it best practice to restart a cluster every week to prevent this issue? Or does this problem happen because the code is not well-written?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-04-2025 07:32 AM
It is best practice to restart the cluster regularly correct! Regularly restarting clusters can help mitigate memory leaks and accumulated GC issues.
And about if it happens because of your code, it depends on what you are doing and if you follow best practices, but would need more insights to tell.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a month ago
Hi,
The code synchronizes Databricks with PostgreSQL by identifying differences and applying INSERT, UPDATE, or DELETE operations to update PostgreSQL. The steps are as follows:
- Read the source data in Databricks using a simple spark.sql query.
- Read the data from PostgreSQL using the JDBC driver.
- Perform a JOIN operation to identify differences.
- Collect the data using .collect() (I am now trying to use .toLocalIterator()).
- Chunk the data and iterate over it, executing DML operations using psycopg2 in batch (extras.execute_batch()), pushing a list of tuples with page_size=1000.
- …and that’s all.
Could the issue be that psycopg2 is not an API call from Databricks, so execution is handled by the driver? Or is the .collect() operation causing a bottleneck by bringing too much data to the driver at once?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a month ago
any suggestion Mr. @Alberto_Umana ?

