cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Several executions of a single notebook lead to java.lang.OutOfMemoryError

loic
New Contributor III

Hello,

I am facing an issue that I do not understand. 

I have as simple Scala notebook with a "read function" that reads a json file on an external storage and does few changes to this DataFrame. I do my test on "all purpose" compute, DS3v2 (14gig/4cores), single node, single user.

Scenario 1 : I create a infinite loop in this notebook on the "read function", the notebook executes properly. Several thousands of loop iterations, no crash, GC is called times to time but memory stay flat.

Scenario 1Scenario 1

Scenario 2 : I replace the infinite loop with a single call to the "read function", and then, schedule this notebook to be executed every minute. The job execution fail after several iterations. First after 240 iterations, and then every 20-30 notebook executions. Few points:

- There are many, many "[GC (Allocation Failure)" in stdout
When the job fail: java.lang.OutOfMemoryError: GC Overhead limit exceeded in stderr (each time CPU goes to 0% in the screenshot) 

-When the job fail: Message in the job: "The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically re-attached."

Scenario 2Scenario 2
I have searched the web but I didn't find anything helping me.
I know that there is only a single sparkContext by Databricks compute which is created when the compute is started. Also, I made a test and I saw that Databricks creates a new spark session each time a task is executed in a job (I made a simple scala notebook which call "spark", and thus display the spark session id).
Does my issue may be related to the fact that spark session are not correctly freed?
Does any body have an idea of the origin of my issue?
Thanks in advance,

Loïc

1 ACCEPTED SOLUTION

Accepted Solutions

loic
New Contributor III

Finally, we understood the issue by ourself.
By default, Databricks create new session for each new job. It is possible to change this behavior with the spark configuration (to put in spark config section of the compute settings):

spark.databricks.session.share true
Once we enabled the share session, then the same session is correctly uses and scenario 2 is working fine.

View solution in original post

1 REPLY 1

loic
New Contributor III

Finally, we understood the issue by ourself.
By default, Databricks create new session for each new job. It is possible to change this behavior with the spark configuration (to put in spark config section of the compute settings):

spark.databricks.session.share true
Once we enabled the share session, then the same session is correctly uses and scenario 2 is working fine.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group