cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Fatal error: The Python kernel is unresponsive.

Data_Analytics1
Contributor III

I am using MultiThread in this job which creates 8 parallel jobs. It fails for few times in a day and sometimes stuck in any of the Python notebook cell process. Here

The Python process exited with an unknown exit code.

The last 10 KB of the process's stderr and stdout can be found below. See driver logs for full logs.

---------------------------------------------------------------------------

Last messages on stderr:

Tue Feb 7 17:10:18 2023 Connection to spark from PID 24461

Tue Feb 7 17:10:18 2023 Initialized gateway on port 34499

Tue Feb 7 17:10:19 2023 Connected to spark.

---------------------------------------------------------------------------

Last messages on stdout:

NOTE: When using the `ipython kernel` entry point, Ctrl-C will not work.

To exit, you will have to explicitly quit this process, by either sending

"quit" from a client, or using Ctrl-\ in UNIX-like environments.

To read more about this, see https://github.com/ipython/ipython/issues/2049

17 REPLIES 17

deedstoke
New Contributor II

We are also facing the same issue.

luis_herrera
New Contributor III
New Contributor III

Unresponsive Python notebooks or cancelled commands could be the cause of "Fatal error: The Python kernel is unresponsive." This can be caused by a number of problems, such as metastore connectivity issues or conflicting libraries.  To troubleshoot this issue, you can check for metastore connectivity and review the Cluster cancels Python command execution due to library conflict KB article for more information:

https://kb.databricks.com/python-command-cancelled

luis_herrera
New Contributor III
New Contributor III

Hey, it seems that the issue is related to the driver undergoing a memory bottleneck, which causes it to crash with an out of memory (OOM) condition and gets restarted or becomes unresponsive due to frequent full garbage collection. The reason for the memory bottleneck can be, among others that 1) The driver instance type is not optimal for the load executed on the driver, 2) there are memory-intensive operations executed on the driver, or many notebooks or jobs are running in parallel on the same cluster. The solution varies from case to case, but the easiest way to resolve the issue in the absence of specific details is to increase the driver's memory. Other points to consider include avoiding memory-intensive operations like collect() operator, which brings a large amount of data to the driver, conversion of a large DataFrame to Pandas, and running batch jobs on a shared interactive cluster. It is also recommended to distribute the workloads into different clusters. (https://kb.databricks.com/jobs/driver-unavailable)

For more information on troubleshooting unresponsive Python notebooks or canceled commands, please refer to the Troubleshooting unresponsive Python notebooks or canceled commands article in the Databricks documentation.

PS: Check #DAIS2023 talks as well

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.