cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Got Failure: com.databricks.backend.common.rpc.SparkDriverExceptions$ReplFatalException error

JKR
New Contributor III

Job is scheduled on interactive cluster, and it failed with below error and in the next scheduled run it ran fine. 
I want to why this error occurred and how can I prevent from occurring this again.

How to debug these types of error? 

 

 

com.databricks.backend.common.rpc.SparkDriverExceptions$ReplFatalException
at com.databricks.backend.daemon.driver.JupyterKernelListener.waitForExecution(JupyterKernelListener.scala:811)
at com.databricks.backend.daemon.driver.JupyterKernelListener.executeCommand(JupyterKernelListener.scala:857)
at com.databricks.backend.daemon.driver.JupyterDriverLocal.executePython(JupyterDriverLocal.scala:578)
at com.databricks.backend.daemon.driver.JupyterDriverLocal.repl(JupyterDriverLocal.scala:535)
at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$24(DriverLocal.scala:879)
at com.databricks.unity.EmptyHandle$.runWith(UCSHandle.scala:124)
at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$21(DriverLocal.scala:862)
at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:412)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:158)
at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:410)
at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:407)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:69)
at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:455)
at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:440)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:69)
at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:839)
at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:660)
at scala.util.Try$.apply(Try.scala:213)
at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:652)
at com.databricks.backend.daemon.driver.DriverWrapper.executeCommandAndGetError(DriverWrapper.scala:571)
at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:606)
at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:448)
at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:389)
at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:247)
at java.lang.Thread.run(Thread.java:750)
 
23/09/02 03:28:46 INFO WorkflowDriver: Workflow run exited with error
com.databricks.NotebookExecutionException: FAILED
at com.databricks.workflow.WorkflowDriver.run0(WorkflowDriver.scala:147)
at com.databricks.workflow.WorkflowDriver.run(WorkflowDriver.scala:94)
at com.databricks.dbutils_v1.impl.NotebookUtilsImpl.run(NotebookUtilsImpl.scala:130)
at com.databricks.dbutils_v1.impl.NotebookUtilsImpl._run(NotebookUtilsImpl.scala:92)
at sun.reflect.GeneratedMethodAccessor810.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
23/09/02 03:28:47 WARN JupyterDriverLocal: User code returned error with traceback: ---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
File <command-92643869439321>:4
      2 Cutoff_time=datetime.now()
      3 while  end_time1<= Cutoff_time:
----> 4      status=dbutils.notebook.run("/Repos/Master/ADB/OKC/Cov/02-okc-silver-5min-agg-batch",1000, {'Start time':start_time,'End time':end_time})
      5      start_time= dt.strptime(start_time,'%Y-%m-%d %H:%M:%S')+timedelta(minutes=5)
      6      start_time=start_time.strftime("%Y-%m-%d %H:%M:%S")
 
File /databricks/python_shell/dbruntime/dbutils.py:204, in DBUtils.NotebookHandler.run(self, path, timeout_seconds, arguments, _NotebookHandler__databricks_internal_cluster_spec)
    203 def run(self, path, timeout_seconds, arguments={}, __databricks_internal_cluster_spec=None):
--> 204     return self.entry_point.getDbutils().notebook()._run(
    205         path, timeout_seconds, arguments, __databricks_internal_cluster_spec,
    206         self.entry_point.getJobGroupId())
 
File /databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py:1321, in JavaMember.__call__(self, *args)
   1315 command = proto.CALL_COMMAND_NAME +\
   1316     self.command_header +\
   1317     args_command +\
   1318     proto.END_COMMAND_PART
   1320 answer = self.gateway_client.send_command(command)
-> 1321 return_value = get_return_value(
   1322     answer, self.gateway_client, self.target_id, self.name)
   1324 for temp_arg in temp_args:
   1325     temp_arg._detach()
 
File /databricks/spark/python/pyspark/errors/exceptions.py:228, in capture_sql_exception.<locals>.deco(*a, **kw)
    226 def deco(*a: Any, **kw: Any) -> Any:
    227     try:
--> 228         return f(*a, **kw)
    229     except Py4JJavaError as e:
    230         converted = convert_exception(e.java_exception)
 
File /databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
    324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325 if answer[1] == REFERENCE_TYPE:
--> 326     raise Py4JJavaError(
    327         "An error occurred while calling {0}{1}{2}.\n".
    328         format(target_id, ".", name), value)
    329 else:
    330     raise Py4JError(
    331         "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
    332         format(target_id, ".", name, value))
 
Py4JJavaError: An error occurred while calling o445._run.
: com.databricks.WorkflowException: com.databricks.NotebookExecutionException: FAILED
at com.databricks.workflow.WorkflowDriver.run(WorkflowDriver.scala:99)
at com.databricks.dbutils_v1.impl.NotebookUtilsImpl.run(NotebookUtilsImpl.scala:130)
at com.databricks.dbutils_v1.impl.NotebookUtilsImpl._run(NotebookUtilsImpl.scala:92)
at sun.reflect.GeneratedMethodAccessor810.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
Caused by: com.databricks.NotebookExecutionException: FAILED
at com.databricks.workflow.WorkflowDriver.run0(WorkflowDriver.scala:147)
at com.databricks.workflow.WorkflowDriver.run(WorkflowDriver.scala:94)
... 13 more

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Tharun-Kumar
Honored Contributor II
Honored Contributor II

@JKR 

Could you try setting the configurations below at the cluster level and retry the job?

spark.databricks.python.defaultPythonRepl pythonshell
spark.databricks.pyspark.py4j.pinnedThread false

View solution in original post

2 REPLIES 2

Tharun-Kumar
Honored Contributor II
Honored Contributor II

@JKR 

Could you try setting the configurations below at the cluster level and retry the job?

spark.databricks.python.defaultPythonRepl pythonshell
spark.databricks.pyspark.py4j.pinnedThread false

JKR
New Contributor III

@Tharun-Kumar My job was scheduled for every 5 mins and the next scheduled job run executed fine and even job runs after this error one mentioned executed fine without adding these configs, so how does adding the above configurations will help me to identify the root cause? I really want to know how I can debug this kind of intermittent issue?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.