cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Got Failure: com.databricks.backend.common.rpc.SparkDriverExceptions$ReplFatalException error

JKR
Contributor

Got below failure on scheduled job on interactive cluster and the next scheduled run executed fine.
I want to know why this error occurred and how can I prevent it to happen again.
And how to debug these errors in future ?

 

 

com.databricks.backend.common.rpc.SparkDriverExceptions$ReplFatalException
at com.databricks.backend.daemon.driver.JupyterKernelListener.waitForExecution(JupyterKernelListener.scala:811)
at com.databricks.backend.daemon.driver.JupyterKernelListener.executeCommand(JupyterKernelListener.scala:857)
at com.databricks.backend.daemon.driver.JupyterDriverLocal.executePython(JupyterDriverLocal.scala:578)
at com.databricks.backend.daemon.driver.JupyterDriverLocal.repl(JupyterDriverLocal.scala:535)
at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$24(DriverLocal.scala:879)
at com.databricks.unity.EmptyHandle$.runWith(UCSHandle.scala:124)
at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$21(DriverLocal.scala:862)
at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:412)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:158)
at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:410)
at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:407)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:69)
at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:455)
at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:440)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:69)
at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:839)
at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:660)
at scala.util.Try$.apply(Try.scala:213)
at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:652)
at com.databricks.backend.daemon.driver.DriverWrapper.executeCommandAndGetError(DriverWrapper.scala:571)
at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:606)
at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:448)
at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:389)
at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:247)
at java.lang.Thread.run(Thread.java:750)
 
 
23/09/02 03:28:46 INFO WorkflowDriver: Workflow run exited with error
com.databricks.NotebookExecutionException: FAILED
at com.databricks.workflow.WorkflowDriver.run0(WorkflowDriver.scala:147)
at com.databricks.workflow.WorkflowDriver.run(WorkflowDriver.scala:94)
at com.databricks.dbutils_v1.impl.NotebookUtilsImpl.run(NotebookUtilsImpl.scala:130)
at com.databricks.dbutils_v1.impl.NotebookUtilsImpl._run(NotebookUtilsImpl.scala:92)
at sun.reflect.GeneratedMethodAccessor810.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
23/09/02 03:28:47 WARN JupyterDriverLocal: User code returned error with traceback: ---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
File <command-92643869439321>:4
      2 Cutoff_time=datetime.now()
      3 while  end_time1<= Cutoff_time:
----> 4      status=dbutils.notebook.run("/Repos/Master/ADB/OKC/Cov/02-okc-silver-5min-agg-batch",1000, {'Start time':start_time,'End time':end_time})
      5      start_time= dt.strptime(start_time,'%Y-%m-%d %H:%M:%S')+timedelta(minutes=5)
      6      start_time=start_time.strftime("%Y-%m-%d %H:%M:%S")
 
File /databricks/python_shell/dbruntime/dbutils.py:204, in DBUtils.NotebookHandler.run(self, path, timeout_seconds, arguments, _NotebookHandler__databricks_internal_cluster_spec)
    203 def run(self, path, timeout_seconds, arguments={}, __databricks_internal_cluster_spec=None):
--> 204     return self.entry_point.getDbutils().notebook()._run(
    205         path, timeout_seconds, arguments, __databricks_internal_cluster_spec,
    206         self.entry_point.getJobGroupId())
 
File /databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py:1321, in JavaMember.__call__(self, *args)
   1315 command = proto.CALL_COMMAND_NAME +\
   1316     self.command_header +\
   1317     args_command +\
   1318     proto.END_COMMAND_PART
   1320 answer = self.gateway_client.send_command(command)
-> 1321 return_value = get_return_value(
   1322     answer, self.gateway_client, self.target_id, self.name)
   1324 for temp_arg in temp_args:
   1325     temp_arg._detach()
 
File /databricks/spark/python/pyspark/errors/exceptions.py:228, in capture_sql_exception.<locals>.deco(*a, **kw)
    226 def deco(*a: Any, **kw: Any) -> Any:
    227     try:
--> 228         return f(*a, **kw)
    229     except Py4JJavaError as e:
    230         converted = convert_exception(e.java_exception)
 
File /databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
    324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325 if answer[1] == REFERENCE_TYPE:
--> 326     raise Py4JJavaError(
    327         "An error occurred while calling {0}{1}{2}.\n".
    328         format(target_id, ".", name), value)
    329 else:
    330     raise Py4JError(
    331         "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
    332         format(target_id, ".", name, value))
 
Py4JJavaError: An error occurred while calling o445._run.
: com.databricks.WorkflowException: com.databricks.NotebookExecutionException: FAILED
at com.databricks.workflow.WorkflowDriver.run(WorkflowDriver.scala:99)
at com.databricks.dbutils_v1.impl.NotebookUtilsImpl.run(NotebookUtilsImpl.scala:130)
at com.databricks.dbutils_v1.impl.NotebookUtilsImpl._run(NotebookUtilsImpl.scala:92)
at sun.reflect.GeneratedMethodAccessor810.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
Caused by: com.databricks.NotebookExecutionException: FAILED
at com.databricks.workflow.WorkflowDriver.run0(WorkflowDriver.scala:147)
at com.databricks.workflow.WorkflowDriver.run(WorkflowDriver.scala:94)
... 13 more
1 ACCEPTED SOLUTION

Accepted Solutions

Tharun-Kumar
Databricks Employee
Databricks Employee

@JKR 

Could you try setting the configurations below at the cluster level and retry the job?

spark.databricks.python.defaultPythonRepl pythonshell
spark.databricks.pyspark.py4j.pinnedThread false

View solution in original post

4 REPLIES 4

Tharun-Kumar
Databricks Employee
Databricks Employee

@JKR 

Could you try setting the configurations below at the cluster level and retry the job?

spark.databricks.python.defaultPythonRepl pythonshell
spark.databricks.pyspark.py4j.pinnedThread false

@Tharun-Kumar   Thanks for sharing this configuration, mentioned error was occurred one time only and the next run executed successfully. Do I still need to add these configurations ? 

Can you please explain or share any doc to let me know the purpose of these configurations.


Tharun-Kumar
Databricks Employee
Databricks Employee

@JKR 

This is to eliminate the possibility of any regression issue due to ipykernel. By setting this we switch back to the default python shell.

jose_gonzalez
Databricks Employee
Databricks Employee

@JKR 
Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group