Recently my Databricks jobs have failed with the error message:
Failure starting repl. Try detaching and re-attaching the notebook.
java.lang.Exception: Python repl did not start in 30 seconds seconds.
at com.databricks.backend.daemon.driver.IpykernelUtils$.startIpyKernel(JupyterDriverLocal.scala:1469)
at com.databricks.backend.daemon.driver.JupyterDriverLocal.startPython(JupyterDriverLocal.scala:1084)
at com.databricks.backend.daemon.driver.JupyterDriverLocal.<init>(JupyterDriverLocal.scala:624)
at com.databricks.backend.daemon.driver.PythonDriverWrapper.instantiateDriver(DriverWrapper.scala:712)
at com.databricks.backend.daemon.driver.DriverWrapper.setupRepl(DriverWrapper.scala:342)
at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:231)
at java.lang.Thread.run(Thread.java:750)
The job is attached to a pool using Spot Pricing.
What is the best procedure to avoid this? Adding retry?