cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Failure starting repl. How to resolve this error? I got this error in a job which is running.

Data_Analytics1
Contributor III

Failure starting repl. Try detaching and re-attaching the notebook.

java.lang.Exception: Python repl did not start in 30 seconds.

at com.databricks.backend.daemon.driver.IpykernelUtils$.startIpyKernel(JupyterDriverLocal.scala:1442)

at com.databricks.backend.daemon.driver.JupyterDriverLocal.startPython(JupyterDriverLocal.scala:1083)

at com.databricks.backend.daemon.driver.JupyterDriverLocal.<init>(JupyterDriverLocal.scala:624)

at com.databricks.backend.daemon.driver.PythonDriverWrapper.instantiateDriver(DriverWrapper.scala:723)

at com.databricks.backend.daemon.driver.DriverWrapper.setupRepl(DriverWrapper.scala:342)

at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:231)

at java.lang.Thread.run(Thread.java:750)

9 REPLIES 9

Vivian_Wilfred
Honored Contributor
Honored Contributor

Hi @Mahesh Chahare​ , check if the cluster is overloaded. This can happen if there are too many REPLs being started because of too many processes.

@Vivian Wilfred​  Previously 20 jobs were running on one worker node. Now I reduced number of the jobs to 9 and increased the number of workers to 5. I am not getting the REPL error now. But I am getting TimeoutException: Futures timed out after [5 seconds] error and Fatal error: The Python kernel is unresponsive error. I was getting these error in my previous run too. REPL is resolved.

Lakshay
Esteemed Contributor
Esteemed Contributor

Hi @Mahesh Chahare​ , this issue usually happens when there are many parallel tasks running in your job with each task trying to open a python REPL. If this is the case for you, please try reducing the number of parallel tasks or increase the driver's memory

@Lakshay Goel​ Previously 20 jobs were running on one worker node. Now I reduced number of the jobs to 9 and increased the number of workers to 5. I am not getting the REPL error now. But I am getting TimeoutException: Futures timed out after [5 seconds] error and Fatal error: The Python kernel is unresponsive error. I was getting these error in my previous run too. REPL is resolved.

Hi @Mahesh Chahare​ , are you using Azure Eventhubs and also if you could tell the DBR version you are working on?

Hi @Lakshay Goel​, First job is using EnevtHub and second job is creating 8 parallel jobs inside it (Second job).

DBR version: 11.3 LTS (includes Apache Spark 3.3.0, Scala 2.12)

Hi @Mahesh Chahare​ , The two issues are unrelated.

  1. For "Fatal error: The Python kernel is unresponsive error." error please set the configuration "spark.databricks.python.defaultPythonRepl pythonshell"
  2. Regarding the "TimeoutException: Futures timed out after [5 seconds] " error, I suspect that the issue is because of the Eventhub connector. Here is the info regarding this issue. You might want to check the version of your Eventhub connector. Also, you can try to use Kafka connector to connect to Eventhubs

Anonymous
Not applicable

Hi @Mahesh Chahare​ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.