08-19-2024 04:55 PM
Hi, we are trying to run some workflows on a shared cluster, with Databricks runtime version 14.3 LTS, and we randomly receive the error:
SparkException: Job aborted due to stage failure: Task 2 in stage 78.0 failed 4 times, most recent failure: Lost task 2.3 in stage 78.0 (TID 269) (10.3.67.68 executor 0): java.lang.NoClassDefFoundError: Could not initialize class daemon.safespark.client.SandboxApiClient$
This started happening randomly since Friday.
If we change the runtime version to 14.2 or 14.1, the job runs, but the pipeline is quite large so the total execution time increases a lot in versions prior to 14.3.
It's quite difficult to find references to this error. I only found in the documentation for updating runtime versions, a reference to UDF, safespark and some changes in how they are handled from version 14.3... we have some UDFs in use, but I can't understand what might be causing this.
I'll attach the full error log below. If anyone can help me with any ideas, I'd appreciate it!
SparkException: Job aborted due to stage failure: Task 2 in stage 78.0 failed 4 times, most recent failure: Lost task 2.3 in stage 78.0 (TID 269) (10.3.67.68 executor 0): java.lang.NoClassDefFoundError: Could not initialize class daemon.safespark.client.SandboxApiClient$
at com.databricks.spark.safespark.ApiAdapter.getNewSandboxAPIClient(ApiAdapter.scala:47)
at com.databricks.spark.safespark.ApiAdapter.client$lzycompute(ApiAdapter.scala:39)
at com.databricks.spark.safespark.ApiAdapter.client(ApiAdapter.scala:39)
at com.databricks.spark.safespark.ApiAdapter.configure(ApiAdapter.scala:62)
at com.databricks.spark.safespark.udf.DispatcherImpl.liftedTree1$1(DispatcherImpl.scala:336)
at com.databricks.spark.safespark.udf.DispatcherImpl.<init>(DispatcherImpl.scala:321)
at com.databricks.spark.safespark.udf.DispatcherImpl$.createDispatcher(DispatcherImpl.scala:743)
at com.databricks.spark.safespark.Dispatcher.liftedTree1$1(Dispatcher.scala:70)
at com.databricks.spark.safespark.Dispatcher.getOrCreateInstance(Dispatcher.scala:68)
at com.databricks.spark.safespark.Dispatcher.createRawConnection(Dispatcher.scala:154)
at com.databricks.spark.api.python.IsolatedPythonWorkerFactory.createRawIsolatedWorker(IsolatedPythonWorkerFactory.scala:228)
at com.databricks.spark.api.python.IsolatedPythonWorkerFactory.create(IsolatedPythonWorkerFactory.scala:293)
at org.apache.spark.SparkEnv.createIsolatedPythonWorker(SparkEnv.scala:300)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:325)
at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:228)
at org.apache.spark.sql.execution.python.BasePythonUDFRunner.compute(PythonUDFRunner.scala:59)
at org.apache.spark.sql.execution.python.BatchEvalPythonEvaluatorFactory.evaluate(BatchEvalPythonExec.scala:80)
at org.apache.spark.sql.execution.python.EvalPythonEvaluatorFactory$EvalPythonPartitionEvaluator.eval(EvalPythonEvaluatorFactory.scala:114)
at org.apache.spark.sql.execution.python.EvalPythonExec.$anonfun$doExecute$2(EvalPythonExec.scala:77)
at org.apache.spark.sql.execution.python.EvalPythonExec.$anonfun$doExecute$2$adapted(EvalPythonExec.scala:76)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2(RDD.scala:920)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2$adapted(RDD.scala:920)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:409)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:373)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:409)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:373)
at org.apache.spark.rdd.CartesianRDD.compute(CartesianRDD.scala:81)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:409)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:373)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:409)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:373)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:409)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:373)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:409)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:373)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:409)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:373)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:409)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:373)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:409)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:373)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:409)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:373)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$3(ShuffleMapTask.scala:88)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$1(ShuffleMapTask.scala:87)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:58)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:39)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:201)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:186)
at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:151)
at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:45)
at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:103)
at com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:108)
at scala.util.Using$.resource(Using.scala:269)
at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:107)
at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:145)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:958)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:105)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:961)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:853)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Driver stacktrace:
JVM stacktrace:
org.apache.spark.SparkException
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3908)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:3830)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:3817)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:3817)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1695)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1680)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1680)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:4154)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:4066)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:4054)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:54)
Caused by: java.lang.NoClassDefFoundError: Could not initialize class daemon.safespark.client.SandboxApiClient$
at com.databricks.spark.safespark.ApiAdapter.getNewSandboxAPIClient(ApiAdapter.scala:47)
at com.databricks.spark.safespark.ApiAdapter.client$lzycompute(ApiAdapter.scala:39)
at com.databricks.spark.safespark.ApiAdapter.client(ApiAdapter.scala:39)
at com.databricks.spark.safespark.ApiAdapter.configure(ApiAdapter.scala:62)
at com.databricks.spark.safespark.udf.DispatcherImpl.liftedTree1$1(DispatcherImpl.scala:336)
at com.databricks.spark.safespark.udf.DispatcherImpl.<init>(DispatcherImpl.scala:321)
at com.databricks.spark.safespark.udf.DispatcherImpl$.createDispatcher(DispatcherImpl.scala:743)
at com.databricks.spark.safespark.Dispatcher.liftedTree1$1(Dispatcher.scala:70)
at com.databricks.spark.safespark.Dispatcher.getOrCreateInstance(Dispatcher.scala:68)
at com.databricks.spark.safespark.Dispatcher.createRawConnection(Dispatcher.scala:154)
at com.databricks.spark.api.python.IsolatedPythonWorkerFactory.createRawIsolatedWorker(IsolatedPythonWorkerFactory.scala:228)
at com.databricks.spark.api.python.IsolatedPythonWorkerFactory.create(IsolatedPythonWorkerFactory.scala:293)
at org.apache.spark.SparkEnv.createIsolatedPythonWorker(SparkEnv.scala:300)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:325)
at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:228)
at org.apache.spark.sql.execution.python.BasePythonUDFRunner.compute(PythonUDFRunner.scala:59)
at org.apache.spark.sql.execution.python.BatchEvalPythonEvaluatorFactory.evaluate(BatchEvalPythonExec.scala:80)
at org.apache.spark.sql.execution.python.EvalPythonEvaluatorFactory$EvalPythonPartitionEvaluator.eval(EvalPythonEvaluatorFactory.scala:114)
at org.apache.spark.sql.execution.python.EvalPythonExec.$anonfun$doExecute$2(EvalPythonExec.scala:77)
at org.apache.spark.sql.execution.python.EvalPythonExec.$anonfun$doExecute$2$adapted(EvalPythonExec.scala:76)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2(RDD.scala:920)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2$adapted(RDD.scala:920)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:409)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:373)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:409)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:373)
at org.apache.spark.rdd.CartesianRDD.compute(CartesianRDD.scala:81)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:409)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:373)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:409)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:373)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:409)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:373)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:409)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:373)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:409)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:373)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:409)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:373)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:409)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:373)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:409)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:373)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$3(ShuffleMapTask.scala:88)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$1(ShuffleMapTask.scala:87)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:58)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:39)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:201)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:186)
at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:151)
at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:45)
at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:103)
at com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:108)
at scala.util.Using$.resource(Using.scala:269)
at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:107)
at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:145)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:958)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:105)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:961)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:853)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
File <command-1087921534225957>, line 1
----> 1 results_python = check_python.execute_grouped_validations(validations, exclude= ['N10001'], append_in = results_python)
File /Workspace/Repos/Leonardo/DTFNDATAFIN/vivo/__init__.py:698, in CheckN1.execute_grouped_validations(self, validations, is_bronze_arquivos, execute_only, exclude, append_in)
696 for validation in grouped_validations[rule]:
697 df_tmp = df.filter(f"{validation['column']}_FLAG = True").withColumnRenamed(f"{validation['column']}_FLAG", 'flag_erro')
--> 698 if df_tmp.head(1) != []:
699 if is_bronze_arquivos:
700 item = {
701 'column': validation['column'],
702 'rule': rule,
703 'condition': validation['condition'],
704 'result': self.create_log(df=df_tmp, column_name=validation['column'], error_code=validation['hint'])
705 }
File /databricks/spark/python/pyspark/sql/connect/dataframe.py:641, in DataFrame.head(self, n)
639 rs = self.head(1)
640 return rs[0] if rs else None
--> 641 return self.take(n)
File /databricks/spark/python/pyspark/sql/connect/dataframe.py:646, in DataFrame.take(self, num)
645 def take(self, num: int) -> List[Row]:
--> 646 return self.limit(num).collect()
File /databricks/spark/python/pyspark/sql/connect/dataframe.py:1833, in DataFrame.collect(self)
1832 def collect(self) -> List[Row]:
-> 1833 table, schema = self._to_table()
1835 schema = schema or from_arrow_schema(table.schema, prefer_timestamp_ntz=True)
1837 assert schema is not None and isinstance(schema, StructType)
File /databricks/spark/python/pyspark/sql/connect/dataframe.py:1868, in DataFrame._to_table(self)
1866 def _to_table(self) -> Tuple["pa.Table", Optional[StructType]]:
1867 query = self._plan.to_proto(self._session.client)
-> 1868 table, schema = self._session.client.to_table(query, self._plan.observations)
1869 assert table is not None
1870 return (table, schema)
File /databricks/spark/python/pyspark/sql/connect/client/core.py:987, in SparkConnectClient.to_table(self, plan, observations)
985 req = self._execute_plan_request_with_metadata()
986 req.plan.CopyFrom(plan)
--> 987 table, schema, _, _, _ = self._execute_and_fetch(req, observations)
988 assert table is not None
989 return table, schema
File /databricks/spark/python/pyspark/sql/connect/client/core.py:1619, in SparkConnectClient._execute_and_fetch(self, req, observations, extra_request_metadata, self_destruct)
1616 schema: Optional[StructType] = None
1617 properties: Dict[str, Any] = {}
-> 1619 for response in self._execute_and_fetch_as_iterator(
1620 req, observations, extra_request_metadata or []
1621 ):
1622 if isinstance(response, StructType):
1623 schema = response
File /databricks/spark/python/pyspark/sql/connect/client/core.py:1596, in SparkConnectClient._execute_and_fetch_as_iterator(self, req, observations, extra_request_metadata)
1594 yield from handle_response(b)
1595 except Exception as error:
-> 1596 self._handle_error(error)
File /databricks/spark/python/pyspark/sql/connect/client/core.py:1905, in SparkConnectClient._handle_error(self, error)
1903 self.thread_local.inside_error_handling = True
1904 if isinstance(error, grpc.RpcError):
-> 1905 self._handle_rpc_error(error)
1906 elif isinstance(error, ValueError):
1907 if "Cannot invoke RPC" in str(error) and "closed" in str(error):
File /databricks/spark/python/pyspark/sql/connect/client/core.py:1980, in SparkConnectClient._handle_rpc_error(self, rpc_error)
1977 info = error_details_pb2.ErrorInfo()
1978 d.Unpack(info)
-> 1980 raise convert_exception(
1981 info,
1982 status.message,
1983 self._fetch_enriched_error(info),
1984 self._display_server_stack_trace(),
1985 ) from None
1987 raise SparkConnectGrpcException(status.message) from None
1988 else:
08-20-2024 12:57 PM
Hi, @l_c_s, this is fixed in https://github.com/databricks/universe/pull/661764; the maintenance release (14.3.12) is rolling out soon.
08-20-2024 01:11 PM - edited 08-20-2024 01:12 PM
Hi, @Retired_mod. Thanks for the answer.The repository you linked to on github returns 404. Can you please tell me what might be causing this so we can get an idea of what we're dealing with?
08-20-2024 01:05 PM
Hi @l_c_s, Thanks for reaching out! Please review the response and let us know if it answers your question. Your feedback is valuable to us and the community.
If the response resolves your issue, kindly mark it as the accepted solution. This will help close the thread and assist others with similar queries.
We appreciate your participation and are here if you need further assistance!
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group