Heyooooo!
I'm using Azure Databricks and sparklyr to do some geospatial analysis.
Before I actually work with Spark Dataframes, I've been using the R packages
stars
and
sf
to do some preprocessing on my data so that it's easier to interact with later.
In order to do so, I've been using
```
install.packages(c("stars", "sf"))
```
in the Notebook I've been working in. I also have two short init script to install
gdal
and
apache sedona
.
Frequently (every 2 hours or so) I find that my notebook decides it no longer wants to allow me to distribute computation across the worker nodes of the cluster. Any Spark Job I submit, returns an error similar to the one reproduced below.
```
22/09/06 19:06:52 INFO Executor: Running task 0.2 in stage 132.0 (TID 1037)
22/09/06 19:06:52 INFO Utils: Fetching spark://<some-ip-that-maybe-should-be-conf>/files/packages.288307a0-2e12-11ed-aba7-00163eb9bf17.tar to /local_disk0/spark-2e8bbf01-1403-4f66-a146-9630f1de7a3c/executor-6ba3d3c6-6c28-4674-aa59-b3e2b66d9786/spark-96ed023d-b040-40af-abc0-70d0654f1110/fetchFileTemp8261221790778705602.tmp
22/09/06 19:06:52 INFO Executor: Fetching spark://<some-ip-that-maybe-should-be-conf>/files/packages.288307a0-2e12-11ed-aba7-00163eb9bf17.tar with timestamp 1662489105379
22/09/06 19:06:52 ERROR Executor: Exception in task 4.1 in stage 132.0 (TID 1034)
org.apache.spark.SparkException: Failed to fetch spark://<some-ip-that-maybe-should-be-conf>/files/packages.288307a0-2e12-11ed-aba7-00163eb9bf17.tar during dependency update
at org.apache.spark.executor.Executor.$anonfun$updateDependencies$4(Executor.scala:1357)
at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:985)
at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:400)
at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:728)
at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:728)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:984)
at org.apache.spark.executor.Executor.updateDependencies(Executor.scala:1348)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:807)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:728)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Stream '/files/packages.288307a0-2e12-11ed-aba7-00163eb9bf17.tar' was not found.
at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:260)
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:142)
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
... 1 more
```
I've seen something similar posted here: https://stackoverflow.com/questions/58933492/databricks-spark-error-dependency-update , but that seems to be resolved, and looks irrelevant here.
Does anyone know how I can debug what I might have screwed up that Databricks wouldn't be able to find the packages I need to copy to the worker nodes? Or how I might go about debugging what's wrong here?
Thanks in advance, and please let me know if there's anything I can provide for better debugging, or if there's a better forum to post this on!
I'm on Databricks Runtime: 11.1 ML, with Spark 3.3.0 and Scala 2.12