Hello,
I am facing an issue that I do not understand.
I have as simple Scala notebook with a "read function" that reads a json file on an external storage and does few changes to this DataFrame. I do my test on "all purpose" compute, DS3v2 (14gig/4cores), single node, single user.
Scenario 1 : I create a infinite loop in this notebook on the "read function", the notebook executes properly. Several thousands of loop iterations, no crash, GC is called times to time but memory stay flat.
Scenario 1
Scenario 2 : I replace the infinite loop with a single call to the "read function", and then, schedule this notebook to be executed every minute. The job execution fail after several iterations. First after 240 iterations, and then every 20-30 notebook executions. Few points:
- There are many, many "[GC (Allocation Failure)" in stdout
- When the job fail: java.lang.OutOfMemoryError: GC Overhead limit exceeded in stderr (each time CPU goes to 0% in the screenshot)
-When the job fail: Message in the job: "The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically re-attached."
Scenario 2
I have searched the web but I didn't find anything helping me.
I know that there is only a single sparkContext by Databricks compute which is created when the compute is started. Also, I made a test and I saw that Databricks creates a new spark session each time a task is executed in a job (I made a simple scala notebook which call "spark", and thus display the spark session id).
Does my issue may be related to the fact that spark session are not correctly freed?
Does any body have an idea of the origin of my issue?
Thanks in advance,
Loïc