cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Got Failure: com.databricks.backend.common.rpc.SparkDriverExceptions$ReplFatalException error

JKR
New Contributor III

Got below failure on scheduled job on interactive cluster and the next scheduled run executed fine.
I want to know why this error occurred and how can I prevent it to happen again.
And how to debug these errors in future ?

 

 

com.databricks.backend.common.rpc.SparkDriverExceptions$ReplFatalException
at com.databricks.backend.daemon.driver.JupyterKernelListener.waitForExecution(JupyterKernelListener.scala:811)
at com.databricks.backend.daemon.driver.JupyterKernelListener.executeCommand(JupyterKernelListener.scala:857)
at com.databricks.backend.daemon.driver.JupyterDriverLocal.executePython(JupyterDriverLocal.scala:578)
at com.databricks.backend.daemon.driver.JupyterDriverLocal.repl(JupyterDriverLocal.scala:535)
at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$24(DriverLocal.scala:879)
at com.databricks.unity.EmptyHandle$.runWith(UCSHandle.scala:124)
at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$21(DriverLocal.scala:862)
at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:412)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:158)
at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:410)
at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:407)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:69)
at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:455)
at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:440)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:69)
at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:839)
at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:660)
at scala.util.Try$.apply(Try.scala:213)
at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:652)
at com.databricks.backend.daemon.driver.DriverWrapper.executeCommandAndGetError(DriverWrapper.scala:571)
at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:606)
at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:448)
at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:389)
at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:247)
at java.lang.Thread.run(Thread.java:750)
 
 
23/09/02 03:28:46 INFO WorkflowDriver: Workflow run exited with error
com.databricks.NotebookExecutionException: FAILED
at com.databricks.workflow.WorkflowDriver.run0(WorkflowDriver.scala:147)
at com.databricks.workflow.WorkflowDriver.run(WorkflowDriver.scala:94)
at com.databricks.dbutils_v1.impl.NotebookUtilsImpl.run(NotebookUtilsImpl.scala:130)
at com.databricks.dbutils_v1.impl.NotebookUtilsImpl._run(NotebookUtilsImpl.scala:92)
at sun.reflect.GeneratedMethodAccessor810.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
23/09/02 03:28:47 WARN JupyterDriverLocal: User code returned error with traceback: ---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
File <command-92643869439321>:4
      2 Cutoff_time=datetime.now()
      3 while  end_time1<= Cutoff_time:
----> 4      status=dbutils.notebook.run("/Repos/Master/ADB/OKC/Cov/02-okc-silver-5min-agg-batch",1000, {'Start time':start_time,'End time':end_time})
      5      start_time= dt.strptime(start_time,'%Y-%m-%d %H:%M:%S')+timedelta(minutes=5)
      6      start_time=start_time.strftime("%Y-%m-%d %H:%M:%S")
 
File /databricks/python_shell/dbruntime/dbutils.py:204, in DBUtils.NotebookHandler.run(self, path, timeout_seconds, arguments, _NotebookHandler__databricks_internal_cluster_spec)
    203 def run(self, path, timeout_seconds, arguments={}, __databricks_internal_cluster_spec=None):
--> 204     return self.entry_point.getDbutils().notebook()._run(
    205         path, timeout_seconds, arguments, __databricks_internal_cluster_spec,
    206         self.entry_point.getJobGroupId())
 
File /databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py:1321, in JavaMember.__call__(self, *args)
   1315 command = proto.CALL_COMMAND_NAME +\
   1316     self.command_header +\
   1317     args_command +\
   1318     proto.END_COMMAND_PART
   1320 answer = self.gateway_client.send_command(command)
-> 1321 return_value = get_return_value(
   1322     answer, self.gateway_client, self.target_id, self.name)
   1324 for temp_arg in temp_args:
   1325     temp_arg._detach()
 
File /databricks/spark/python/pyspark/errors/exceptions.py:228, in capture_sql_exception.<locals>.deco(*a, **kw)
    226 def deco(*a: Any, **kw: Any) -> Any:
    227     try:
--> 228         return f(*a, **kw)
    229     except Py4JJavaError as e:
    230         converted = convert_exception(e.java_exception)
 
File /databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
    324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325 if answer[1] == REFERENCE_TYPE:
--> 326     raise Py4JJavaError(
    327         "An error occurred while calling {0}{1}{2}.\n".
    328         format(target_id, ".", name), value)
    329 else:
    330     raise Py4JError(
    331         "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
    332         format(target_id, ".", name, value))
 
Py4JJavaError: An error occurred while calling o445._run.
: com.databricks.WorkflowException: com.databricks.NotebookExecutionException: FAILED
at com.databricks.workflow.WorkflowDriver.run(WorkflowDriver.scala:99)
at com.databricks.dbutils_v1.impl.NotebookUtilsImpl.run(NotebookUtilsImpl.scala:130)
at com.databricks.dbutils_v1.impl.NotebookUtilsImpl._run(NotebookUtilsImpl.scala:92)
at sun.reflect.GeneratedMethodAccessor810.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
Caused by: com.databricks.NotebookExecutionException: FAILED
at com.databricks.workflow.WorkflowDriver.run0(WorkflowDriver.scala:147)
at com.databricks.workflow.WorkflowDriver.run(WorkflowDriver.scala:94)
... 13 more
1 ACCEPTED SOLUTION

Accepted Solutions

Tharun-Kumar
Honored Contributor II
Honored Contributor II

@JKR 

Could you try setting the configurations below at the cluster level and retry the job?

spark.databricks.python.defaultPythonRepl pythonshell
spark.databricks.pyspark.py4j.pinnedThread false

View solution in original post

4 REPLIES 4

Tharun-Kumar
Honored Contributor II
Honored Contributor II

@JKR 

Could you try setting the configurations below at the cluster level and retry the job?

spark.databricks.python.defaultPythonRepl pythonshell
spark.databricks.pyspark.py4j.pinnedThread false

JKR
New Contributor III

@Tharun-Kumar   Thanks for sharing this configuration, mentioned error was occurred one time only and the next run executed successfully. Do I still need to add these configurations ? 

Can you please explain or share any doc to let me know the purpose of these configurations.


Tharun-Kumar
Honored Contributor II
Honored Contributor II

@JKR 

This is to eliminate the possibility of any regression issue due to ipykernel. By setting this we switch back to the default python shell.

jose_gonzalez
Moderator
Moderator

@JKR 
Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.