โ08-02-2023 12:12 AM
While using a Python notebook that works on my machine it crashes on the same point with the errors "The Python kernel is unresponsive" and "The Python process exited with exit code 134 (SIGABRT: Aborted).", but with no stacktrace for debugging the issue in the notebook output or in the databricks cluster's logs (and no memory spikes in the monitoring). What can I do to debug this issue?
โ08-02-2023 12:40 AM
Hi @TalY ,
โ08-02-2023 01:39 AM
I have been using Ganglia UI but I didn't see the memory running out, is it the correct way for monitoring memory usage? are there more options?
โ08-02-2023 03:06 AM
โ08-04-2023 08:03 AM
This is almost surely OOM. Yes you use the Metrics tab in the cluster UI to see memory usage. However, you may not observe memory usage is high before OOM - maybe something is allocating a huge amount of memory at once.
I think 90% of these issues are resolvable by code inspection. What step fails? is it pulling a bunch of stuff to the driver? are you allocating a huge dataset?
โ08-06-2023 06:44 AM
I did notice a couple of times log messages in the driver's logs about memory allocation failure, so I tried 2 things one is to use smaller dataframe (from 200k rows to 10k) and the second is optimizing the use with pandas which did not help. After some searching over the weekend, I found that adding the following lines prevent it from crashing:
โ08-06-2023 02:38 PM
@TalY - Could you please let us know the DBR version used for running? Kindly try DBR 12.2 LTS or above.
In order to debug this, there will be a hs_err_pid.log file provided with the problematic JVM details under the "python kernel unresponsive" error stack trace.
โ08-07-2023 12:50 AM
I am using the following DBR 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12).
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group