cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Got Failure: com.databricks.backend.common.rpc.SparkDriverExceptions$ReplFatalException error

JKR
New Contributor III

Got below failure on scheduled job on interactive cluster and the next scheduled run executed fine.
I want to know why this error occurred and how can I prevent it to happen again.
And how to debug these errors in future ?

 

 

com.databricks.backend.common.rpc.SparkDriverExceptions$ReplFatalException
at com.databricks.backend.daemon.driver.JupyterKernelListener.waitForExecution(JupyterKernelListener.scala:811)
at com.databricks.backend.daemon.driver.JupyterKernelListener.executeCommand(JupyterKernelListener.scala:857)
at com.databricks.backend.daemon.driver.JupyterDriverLocal.executePython(JupyterDriverLocal.scala:578)
at com.databricks.backend.daemon.driver.JupyterDriverLocal.repl(JupyterDriverLocal.scala:535)
at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$24(DriverLocal.scala:879)
at com.databricks.unity.EmptyHandle$.runWith(UCSHandle.scala:124)
at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$21(DriverLocal.scala:862)
at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:412)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:158)
at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:410)
at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:407)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:69)
at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:455)
at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:440)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:69)
at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:839)
at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:660)
at scala.util.Try$.apply(Try.scala:213)
at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:652)
at com.databricks.backend.daemon.driver.DriverWrapper.executeCommandAndGetError(DriverWrapper.scala:571)
at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:606)
at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:448)
at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:389)
at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:247)
at java.lang.Thread.run(Thread.java:750)
 
 
23/09/02 03:28:46 INFO WorkflowDriver: Workflow run exited with error
com.databricks.NotebookExecutionException: FAILED
at com.databricks.workflow.WorkflowDriver.run0(WorkflowDriver.scala:147)
at com.databricks.workflow.WorkflowDriver.run(WorkflowDriver.scala:94)
at com.databricks.dbutils_v1.impl.NotebookUtilsImpl.run(NotebookUtilsImpl.scala:130)
at com.databricks.dbutils_v1.impl.NotebookUtilsImpl._run(NotebookUtilsImpl.scala:92)
at sun.reflect.GeneratedMethodAccessor810.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
23/09/02 03:28:47 WARN JupyterDriverLocal: User code returned error with traceback: ---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
File <command-92643869439321>:4
      2 Cutoff_time=datetime.now()
      3 while  end_time1<= Cutoff_time:
----> 4      status=dbutils.notebook.run("/Repos/Master/ADB/OKC/Cov/02-okc-silver-5min-agg-batch",1000, {'Start time':start_time,'End time':end_time})
      5      start_time= dt.strptime(start_time,'%Y-%m-%d %H:%M:%S')+timedelta(minutes=5)
      6      start_time=start_time.strftime("%Y-%m-%d %H:%M:%S")
 
File /databricks/python_shell/dbruntime/dbutils.py:204, in DBUtils.NotebookHandler.run(self, path, timeout_seconds, arguments, _NotebookHandler__databricks_internal_cluster_spec)
    203 def run(self, path, timeout_seconds, arguments={}, __databricks_internal_cluster_spec=None):
--> 204     return self.entry_point.getDbutils().notebook()._run(
    205         path, timeout_seconds, arguments, __databricks_internal_cluster_spec,
    206         self.entry_point.getJobGroupId())
 
File /databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py:1321, in JavaMember.__call__(self, *args)
   1315 command = proto.CALL_COMMAND_NAME +\
   1316     self.command_header +\
   1317     args_command +\
   1318     proto.END_COMMAND_PART
   1320 answer = self.gateway_client.send_command(command)
-> 1321 return_value = get_return_value(
   1322     answer, self.gateway_client, self.target_id, self.name)
   1324 for temp_arg in temp_args:
   1325     temp_arg._detach()
 
File /databricks/spark/python/pyspark/errors/exceptions.py:228, in capture_sql_exception.<locals>.deco(*a, **kw)
    226 def deco(*a: Any, **kw: Any) -> Any:
    227     try:
--> 228         return f(*a, **kw)
    229     except Py4JJavaError as e:
    230         converted = convert_exception(e.java_exception)
 
File /databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
    324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325 if answer[1] == REFERENCE_TYPE:
--> 326     raise Py4JJavaError(
    327         "An error occurred while calling {0}{1}{2}.\n".
    328         format(target_id, ".", name), value)
    329 else:
    330     raise Py4JError(
    331         "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
    332         format(target_id, ".", name, value))
 
Py4JJavaError: An error occurred while calling o445._run.
: com.databricks.WorkflowException: com.databricks.NotebookExecutionException: FAILED
at com.databricks.workflow.WorkflowDriver.run(WorkflowDriver.scala:99)
at com.databricks.dbutils_v1.impl.NotebookUtilsImpl.run(NotebookUtilsImpl.scala:130)
at com.databricks.dbutils_v1.impl.NotebookUtilsImpl._run(NotebookUtilsImpl.scala:92)
at sun.reflect.GeneratedMethodAccessor810.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
Caused by: com.databricks.NotebookExecutionException: FAILED
at com.databricks.workflow.WorkflowDriver.run0(WorkflowDriver.scala:147)
at com.databricks.workflow.WorkflowDriver.run(WorkflowDriver.scala:94)
... 13 more
1 ACCEPTED SOLUTION

Accepted Solutions

Tharun-Kumar
Honored Contributor II
Honored Contributor II

@JKR 

Could you try setting the configurations below at the cluster level and retry the job?

spark.databricks.python.defaultPythonRepl pythonshell
spark.databricks.pyspark.py4j.pinnedThread false

View solution in original post

4 REPLIES 4

Tharun-Kumar
Honored Contributor II
Honored Contributor II

@JKR 

Could you try setting the configurations below at the cluster level and retry the job?

spark.databricks.python.defaultPythonRepl pythonshell
spark.databricks.pyspark.py4j.pinnedThread false

JKR
New Contributor III

@Tharun-Kumar   Thanks for sharing this configuration, mentioned error was occurred one time only and the next run executed successfully. Do I still need to add these configurations ? 

Can you please explain or share any doc to let me know the purpose of these configurations.


Tharun-Kumar
Honored Contributor II
Honored Contributor II

@JKR 

This is to eliminate the possibility of any regression issue due to ipykernel. By setting this we switch back to the default python shell.

jose_gonzalez
Moderator
Moderator

@JKR 
Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.