Databricks Community

loic · ‎02-06-2025

Hello,

I am facing an issue that I do not understand.

I have as simple Scala notebook with a "read function" that reads a json file on an external storage and does few changes to this DataFrame. I do my test on "all purpose" compute, DS3v2 (14gig/4cores), single node, single user.

Scenario 1 : I create a infinite loop in this notebook on the "read function", the notebook executes properly. Several thousands of loop iterations, no crash, GC is called times to time but memory stay flat.

Scenario 1

Scenario 2 : I replace the infinite loop with a single call to the "read function", and then, schedule this notebook to be executed every minute. The job execution fail after several iterations. First after 240 iterations, and then every 20-30 notebook executions. Few points:

- There are many, many "[GC (Allocation Failure)" in stdout
- When the job fail: java.lang.OutOfMemoryError: GC Overhead limit exceeded in stderr (each time CPU goes to 0% in the screenshot)

-When the job fail: Message in the job: "The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically re-attached."

Scenario 2
I have searched the web but I didn't find anything helping me.
I know that there is only a single sparkContext by Databricks compute which is created when the compute is started. Also, I made a test and I saw that Databricks creates a new spark session each time a task is executed in a job (I made a simple scala notebook which call "spark", and thus display the spark session id).
Does my issue may be related to the fact that spark session are not correctly freed?
Does any body have an idea of the origin of my issue?
Thanks in advance,

Loïc

loic · ‎02-07-2025

Finally, we understood the issue by ourself.
By default, Databricks create new session for each new job. It is possible to change this behavior with the spark configuration (to put in spark config section of the compute settings):

spark.databricks.session.share true
Once we enabled the share session, then the same session is correctly uses and scenario 2 is working fine.

View solution in original post

loic · ‎02-07-2025

Finally, we understood the issue by ourself.
By default, Databricks create new session for each new job. It is possible to change this behavior with the spark configuration (to put in spark config section of the compute settings):

spark.databricks.session.share true
Once we enabled the share session, then the same session is correctly uses and scenario 2 is working fine.

Databricks Community

Several executions of a single notebook lead to java.lang.OutOfMemoryError

Join Us as a Local Community Builder!

🚀 Weekly Delta (8 - 14 October): A Look Back at This Week’s Top Community Highlights

Databricks Community Champion - September 2025 - Nayanjyoti Sonowal

🌟 Community Sparks of the Week | September 26 – October 2 🌟

Solution Accelerator Series | #4 - Toxicity Detection for Gaming

Level Up with Databricks Specialist Sessions