cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

com.databricks.spark.safespark.UDFException: UNAVAILABLE: Channel shutdownNow invoked

zak_k
New Contributor III

Trying to determine a root cause of UDFException that occurs when returning a variable length ArrayType. If I hardcode the data returned from the UDF to a fixed length, say 19, the error does not occur.

 

Setup code

split_runs_UDF = udf(split_runs_udf, ArrayType(StructType([StructField('split_run_id', StringType(), True), StructField('split_run_start_time', TimestampType(), True), StructField('split_run_end_time', TimestampType(), True)])))

The data gets subsequently `explode`d and then the `StructField`s are mapped to columns.

SparkException: Job aborted due to stage failure: Task 0 in stage 46.0 failed 4 times, most recent failure: Lost task 0.3 in stage 46.0 (TID 133) (10.0.17.11 executor 0): com.databricks.spark.safespark.UDFException: UNAVAILABLE: Channel shutdownNow invoked

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 46.0 failed 4 times, most recent failure: Lost task 0.3 in stage 46.0 (TID 133) (10.0.17.11 executor 0): com.databricks.spark.safespark.UDFException: UNAVAILABLE: Channel shutdownNow invoked
	at com.databricks.spark.safespark.udf.UDFSession.handleException(UDFSession.scala:128)
	at com.databricks.spark.safespark.udf.UDFSession$$anon$2.onError(UDFSession.scala:183)
	at grpc_shaded.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:478)
	at grpc_shaded.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:562)
	at grpc_shaded.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:70)
	at grpc_shaded.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:743)
	at grpc_shaded.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:722)
	at grpc_shaded.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at grpc_shaded.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750) Caused by: grpc_shaded.io.grpc.StatusRuntimeException: UNAVAILABLE: Channel shutdownNow invoked
	at grpc_shaded.io.grpc.Status.asRuntimeException(Status.java:535)
	... 10 more Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3555)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:3487)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:3476)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:3476)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1493)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1493)
	at scala.Option.foreach(Option.scala:407)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1493)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3801)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3713)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3701)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:51)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$runJob$1(DAGScheduler.scala:1217)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1205)
	at org.apache.spark.SparkContext.runJobInternal(SparkContext.scala:2946)
	at org.apache.spark.sql.execution.collect.Collector.$anonfun$runSparkJobs$1(Collector.scala:338)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
	at org.apache.spark.sql.execution.collect.Collector.runSparkJobs(Collector.scala:282)
	at org.apache.spark.sql.execution.collect.Collector.$anonfun$collect$1(Collector.scala:366)
	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
	at org.apache.spark.sql.execution.collect.Collector.collect(Collector.scala:363)
	at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:117)
	at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:124)
	at org.apache.spark.sql.execution.qrc.InternalRowFormat$.collect(cachedSparkResults.scala:126)
	at org.apache.spark.sql.execution.qrc.InternalRowFormat$.collect(cachedSparkResults.scala:114)
	at org.apache.spark.sql.execution.qrc.InternalRowFormat$.collect(cachedSparkResults.scala:94)
	at org.apache.spark.sql.execution.qrc.ResultCacheManager.$anonfun$computeResult$1(ResultCacheManager.scala:553)
	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
	at org.apache.spark.sql.execution.qrc.ResultCacheManager.collectResult$1(ResultCacheManager.scala:545)
	at org.apache.spark.sql.execution.qrc.ResultCacheManager.computeResult(ResultCacheManager.scala:565)
	at org.apache.spark.sql.execution.qrc.ResultCacheManager.$anonfun$getOrComputeResultInternal$1(ResultCacheManager.scala:426)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.execution.qrc.ResultCacheManager.getOrComputeResultInternal(ResultCacheManager.scala:419)
	at org.apache.spark.sql.execution.qrc.ResultCacheManager.getOrComputeResult(ResultCacheManager.scala:313)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeCollectResult$1(SparkPlan.scala:504)
	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
	at org.apache.spark.sql.execution.SparkPlan.executeCollectResult(SparkPlan.scala:501)
	at org.apache.spark.sql.Dataset.collectResult(Dataset.scala:3628)
	at org.apache.spark.sql.Dataset.$anonfun$collectResult$1(Dataset.scala:3619)
	at org.apache.spark.sql.Dataset.$anonfun$withAction$3(Dataset.scala:4544)
	at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:935)
	at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:4542)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$8(SQLExecution.scala:274)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:498)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:201)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1113)
	at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:151)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:447)
	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4542)
	at org.apache.spark.sql.Dataset.collectResult(Dataset.scala:3618)
	at com.databricks.backend.daemon.driver.OutputAggregator$.withOutputAggregation0(OutputAggregator.scala:267)
	at com.databricks.backend.daemon.driver.OutputAggregator$.withOutputAggregation(OutputAggregator.scala:101)
	at com.databricks.backend.daemon.driver.PythonDriverLocalBase.generateTableResult(PythonDriverLocalBase.scala:773)
	at com.databricks.backend.daemon.driver.JupyterDriverLocal.computeListResultsItem(JupyterDriverLocal.scala:1077)
	at com.databricks.backend.daemon.driver.JupyterDriverLocal$JupyterEntryPoint.addCustomDisplayData(JupyterDriverLocal.scala:254)
	at sun.reflect.GeneratedMethodAccessor616.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
	at py4j.Gateway.invoke(Gateway.java:306)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
	at java.lang.Thread.run(Thread.java:750)

 

5 REPLIES 5

Noopur_Nigam
Valued Contributor
Valued Contributor

Hi @zak_k Could you give more context on the usecase? Are you using this udf in a DLT pipeline? which dbr version you are using in your cluster?

 

zak_k
New Contributor III

It's not currently part of DLT in this early phase, but that is the end goal. DBR version is 13.3.

The use case is splitting some records into 2 or more records (when start and end time cross a shift work boundary). 

zak_k
New Contributor III

I'm being told it only occurs on shared mode clusters, I'm going to confirm this now.

zak_k
New Contributor III

I have Confirmed that in small batches the UDF works on both single user and shared clusters, but with large data sets, it only works for single user clusters

zak_k
New Contributor III

After further investigation, It reproduces slightly differently on single user mode.
Single user mode: runs forever
Shared: gives the above message

I've determined that there was a corner case in the dataset which lead to UDF never returning. I am am assuming that on shared the udf runner is killed by a watchdog and that is why the error message is so cryptic (rather than "Timeout" type error)

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group