cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Got Failure: com.databricks.backend.common.rpc.SparkDriverExceptions$ReplFatalException error

JKR
Contributor

Job is scheduled on interactive cluster, and it failed with below error and in the next scheduled run it ran fine. 
I want to why this error occurred and how can I prevent from occurring this again.

How to debug these types of error? 

 

 

com.databricks.backend.common.rpc.SparkDriverExceptions$ReplFatalException
at com.databricks.backend.daemon.driver.JupyterKernelListener.waitForExecution(JupyterKernelListener.scala:811)
at com.databricks.backend.daemon.driver.JupyterKernelListener.executeCommand(JupyterKernelListener.scala:857)
at com.databricks.backend.daemon.driver.JupyterDriverLocal.executePython(JupyterDriverLocal.scala:578)
at com.databricks.backend.daemon.driver.JupyterDriverLocal.repl(JupyterDriverLocal.scala:535)
at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$24(DriverLocal.scala:879)
at com.databricks.unity.EmptyHandle$.runWith(UCSHandle.scala:124)
at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$21(DriverLocal.scala:862)
at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:412)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:158)
at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:410)
at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:407)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:69)
at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:455)
at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:440)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:69)
at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:839)
at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:660)
at scala.util.Try$.apply(Try.scala:213)
at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:652)
at com.databricks.backend.daemon.driver.DriverWrapper.executeCommandAndGetError(DriverWrapper.scala:571)
at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:606)
at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:448)
at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:389)
at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:247)
at java.lang.Thread.run(Thread.java:750)
 
23/09/02 03:28:46 INFO WorkflowDriver: Workflow run exited with error
com.databricks.NotebookExecutionException: FAILED
at com.databricks.workflow.WorkflowDriver.run0(WorkflowDriver.scala:147)
at com.databricks.workflow.WorkflowDriver.run(WorkflowDriver.scala:94)
at com.databricks.dbutils_v1.impl.NotebookUtilsImpl.run(NotebookUtilsImpl.scala:130)
at com.databricks.dbutils_v1.impl.NotebookUtilsImpl._run(NotebookUtilsImpl.scala:92)
at sun.reflect.GeneratedMethodAccessor810.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
23/09/02 03:28:47 WARN JupyterDriverLocal: User code returned error with traceback: ---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
File <command-92643869439321>:4
      2 Cutoff_time=datetime.now()
      3 while  end_time1<= Cutoff_time:
----> 4      status=dbutils.notebook.run("/Repos/Master/ADB/OKC/Cov/02-okc-silver-5min-agg-batch",1000, {'Start time':start_time,'End time':end_time})
      5      start_time= dt.strptime(start_time,'%Y-%m-%d %H:%M:%S')+timedelta(minutes=5)
      6      start_time=start_time.strftime("%Y-%m-%d %H:%M:%S")
 
File /databricks/python_shell/dbruntime/dbutils.py:204, in DBUtils.NotebookHandler.run(self, path, timeout_seconds, arguments, _NotebookHandler__databricks_internal_cluster_spec)
    203 def run(self, path, timeout_seconds, arguments={}, __databricks_internal_cluster_spec=None):
--> 204     return self.entry_point.getDbutils().notebook()._run(
    205         path, timeout_seconds, arguments, __databricks_internal_cluster_spec,
    206         self.entry_point.getJobGroupId())
 
File /databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py:1321, in JavaMember.__call__(self, *args)
   1315 command = proto.CALL_COMMAND_NAME +\
   1316     self.command_header +\
   1317     args_command +\
   1318     proto.END_COMMAND_PART
   1320 answer = self.gateway_client.send_command(command)
-> 1321 return_value = get_return_value(
   1322     answer, self.gateway_client, self.target_id, self.name)
   1324 for temp_arg in temp_args:
   1325     temp_arg._detach()
 
File /databricks/spark/python/pyspark/errors/exceptions.py:228, in capture_sql_exception.<locals>.deco(*a, **kw)
    226 def deco(*a: Any, **kw: Any) -> Any:
    227     try:
--> 228         return f(*a, **kw)
    229     except Py4JJavaError as e:
    230         converted = convert_exception(e.java_exception)
 
File /databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
    324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325 if answer[1] == REFERENCE_TYPE:
--> 326     raise Py4JJavaError(
    327         "An error occurred while calling {0}{1}{2}.\n".
    328         format(target_id, ".", name), value)
    329 else:
    330     raise Py4JError(
    331         "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
    332         format(target_id, ".", name, value))
 
Py4JJavaError: An error occurred while calling o445._run.
: com.databricks.WorkflowException: com.databricks.NotebookExecutionException: FAILED
at com.databricks.workflow.WorkflowDriver.run(WorkflowDriver.scala:99)
at com.databricks.dbutils_v1.impl.NotebookUtilsImpl.run(NotebookUtilsImpl.scala:130)
at com.databricks.dbutils_v1.impl.NotebookUtilsImpl._run(NotebookUtilsImpl.scala:92)
at sun.reflect.GeneratedMethodAccessor810.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
Caused by: com.databricks.NotebookExecutionException: FAILED
at com.databricks.workflow.WorkflowDriver.run0(WorkflowDriver.scala:147)
at com.databricks.workflow.WorkflowDriver.run(WorkflowDriver.scala:94)
... 13 more

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Tharun-Kumar
Databricks Employee
Databricks Employee

@JKR 

Could you try setting the configurations below at the cluster level and retry the job?

spark.databricks.python.defaultPythonRepl pythonshell
spark.databricks.pyspark.py4j.pinnedThread false

View solution in original post

2 REPLIES 2

Tharun-Kumar
Databricks Employee
Databricks Employee

@JKR 

Could you try setting the configurations below at the cluster level and retry the job?

spark.databricks.python.defaultPythonRepl pythonshell
spark.databricks.pyspark.py4j.pinnedThread false

@Tharun-Kumar My job was scheduled for every 5 mins and the next scheduled job run executed fine and even job runs after this error one mentioned executed fine without adding these configs, so how does adding the above configurations will help me to identify the root cause? I really want to know how I can debug this kind of intermittent issue?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group