<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Understanding Dependency Update Failure in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/understanding-dependency-update-failure/m-p/32423#M23623</link>
    <description>&lt;P&gt;Heyooooo!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'm using Azure Databricks and sparklyr to do some geospatial analysis.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Before I actually work with Spark Dataframes, I've been using the R packages&lt;/P&gt;&lt;P&gt;stars&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;and&lt;/P&gt;&lt;P&gt;sf&lt;/P&gt;&lt;P&gt;to do some preprocessing on my data so that it's easier to interact with later.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In order to do so, I've been using&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;install.packages(c("stars", "sf"))&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;in the Notebook I've been working in. I also have two short init script to install&lt;/P&gt;&lt;P&gt;gdal&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;and&lt;/P&gt;&lt;P&gt;apache sedona&lt;/P&gt;&lt;P&gt;.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Frequently (every 2 hours or so) I find that my notebook decides it no longer wants to allow me to distribute computation across the worker nodes of the cluster. Any Spark Job I submit, returns an error similar to the one reproduced below.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;22/09/06 19:06:52 INFO Executor: Running task 0.2 in stage 132.0 (TID 1037)&lt;/P&gt;&lt;P&gt;22/09/06 19:06:52 INFO Utils: Fetching spark://&amp;lt;some-ip-that-maybe-should-be-conf&amp;gt;/files/packages.288307a0-2e12-11ed-aba7-00163eb9bf17.tar to /local_disk0/spark-2e8bbf01-1403-4f66-a146-9630f1de7a3c/executor-6ba3d3c6-6c28-4674-aa59-b3e2b66d9786/spark-96ed023d-b040-40af-abc0-70d0654f1110/fetchFileTemp8261221790778705602.tmp&lt;/P&gt;&lt;P&gt;22/09/06 19:06:52 INFO Executor: Fetching spark://&amp;lt;some-ip-that-maybe-should-be-conf&amp;gt;/files/packages.288307a0-2e12-11ed-aba7-00163eb9bf17.tar with timestamp 1662489105379&lt;/P&gt;&lt;P&gt;22/09/06 19:06:52 ERROR Executor: Exception in task 4.1 in stage 132.0 (TID 1034)&lt;/P&gt;&lt;P&gt;org.apache.spark.SparkException: Failed to fetch spark://&amp;lt;some-ip-that-maybe-should-be-conf&amp;gt;/files/packages.288307a0-2e12-11ed-aba7-00163eb9bf17.tar during dependency update&lt;/P&gt;&lt;P&gt;at org.apache.spark.executor.Executor.$anonfun$updateDependencies$4(Executor.scala:1357)&lt;/P&gt;&lt;P&gt;at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:985)&lt;/P&gt;&lt;P&gt;at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:400)&lt;/P&gt;&lt;P&gt;at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:728)&lt;/P&gt;&lt;P&gt;at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:728)&lt;/P&gt;&lt;P&gt;at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:984)&lt;/P&gt;&lt;P&gt;at org.apache.spark.executor.Executor.updateDependencies(Executor.scala:1348)&lt;/P&gt;&lt;P&gt;at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:807)&lt;/P&gt;&lt;P&gt;at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)&lt;/P&gt;&lt;P&gt;at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)&lt;/P&gt;&lt;P&gt;at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:728)&lt;/P&gt;&lt;P&gt;at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)&lt;/P&gt;&lt;P&gt;at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)&lt;/P&gt;&lt;P&gt;at java.lang.Thread.run(Thread.java:748)&lt;/P&gt;&lt;P&gt;Caused by: java.lang.RuntimeException: Stream '/files/packages.288307a0-2e12-11ed-aba7-00163eb9bf17.tar' was not found.&lt;/P&gt;&lt;P&gt;at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:260)&lt;/P&gt;&lt;P&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:142)&lt;/P&gt;&lt;P&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)&lt;/P&gt;&lt;P&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)&lt;/P&gt;&lt;P&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)&lt;/P&gt;&lt;P&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)&lt;/P&gt;&lt;P&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)&lt;/P&gt;&lt;P&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)&lt;/P&gt;&lt;P&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)&lt;/P&gt;&lt;P&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)&lt;/P&gt;&lt;P&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)&lt;/P&gt;&lt;P&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;/P&gt;&lt;P&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)&lt;/P&gt;&lt;P&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)&lt;/P&gt;&lt;P&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)&lt;/P&gt;&lt;P&gt;at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)&lt;/P&gt;&lt;P&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)&lt;/P&gt;&lt;P&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)&lt;/P&gt;&lt;P&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)&lt;/P&gt;&lt;P&gt;at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)&lt;/P&gt;&lt;P&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)&lt;/P&gt;&lt;P&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)&lt;/P&gt;&lt;P&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)&lt;/P&gt;&lt;P&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)&lt;/P&gt;&lt;P&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)&lt;/P&gt;&lt;P&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)&lt;/P&gt;&lt;P&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)&lt;/P&gt;&lt;P&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)&lt;/P&gt;&lt;P&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)&lt;/P&gt;&lt;P&gt;at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)&lt;/P&gt;&lt;P&gt;at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)&lt;/P&gt;&lt;P&gt;... 1 more&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I've seen something similar posted here: &lt;A href="https://stackoverflow.com/questions/58933492/databricks-spark-error-dependency-update" alt="https://stackoverflow.com/questions/58933492/databricks-spark-error-dependency-update" target="_blank"&gt;https://stackoverflow.com/questions/58933492/databricks-spark-error-dependency-update&lt;/A&gt; , but that seems to be resolved, and looks irrelevant here.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Does anyone know how I can debug what I might have screwed up that Databricks wouldn't be able to find the packages I need to copy to the worker nodes? Or how I might go about debugging what's wrong here?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks in advance, and please let me know if there's anything I can provide for better debugging, or if there's a better forum to post this on!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'm on Databricks Runtime: 11.1 ML, with Spark 3.3.0 and Scala 2.12&lt;/P&gt;</description>
    <pubDate>Tue, 06 Sep 2022 19:25:15 GMT</pubDate>
    <dc:creator>Brendon_Daugher</dc:creator>
    <dc:date>2022-09-06T19:25:15Z</dc:date>
    <item>
      <title>Understanding Dependency Update Failure</title>
      <link>https://community.databricks.com/t5/data-engineering/understanding-dependency-update-failure/m-p/32423#M23623</link>
      <description>&lt;P&gt;Heyooooo!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'm using Azure Databricks and sparklyr to do some geospatial analysis.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Before I actually work with Spark Dataframes, I've been using the R packages&lt;/P&gt;&lt;P&gt;stars&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;and&lt;/P&gt;&lt;P&gt;sf&lt;/P&gt;&lt;P&gt;to do some preprocessing on my data so that it's easier to interact with later.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In order to do so, I've been using&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;install.packages(c("stars", "sf"))&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;in the Notebook I've been working in. I also have two short init script to install&lt;/P&gt;&lt;P&gt;gdal&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;and&lt;/P&gt;&lt;P&gt;apache sedona&lt;/P&gt;&lt;P&gt;.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Frequently (every 2 hours or so) I find that my notebook decides it no longer wants to allow me to distribute computation across the worker nodes of the cluster. Any Spark Job I submit, returns an error similar to the one reproduced below.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;22/09/06 19:06:52 INFO Executor: Running task 0.2 in stage 132.0 (TID 1037)&lt;/P&gt;&lt;P&gt;22/09/06 19:06:52 INFO Utils: Fetching spark://&amp;lt;some-ip-that-maybe-should-be-conf&amp;gt;/files/packages.288307a0-2e12-11ed-aba7-00163eb9bf17.tar to /local_disk0/spark-2e8bbf01-1403-4f66-a146-9630f1de7a3c/executor-6ba3d3c6-6c28-4674-aa59-b3e2b66d9786/spark-96ed023d-b040-40af-abc0-70d0654f1110/fetchFileTemp8261221790778705602.tmp&lt;/P&gt;&lt;P&gt;22/09/06 19:06:52 INFO Executor: Fetching spark://&amp;lt;some-ip-that-maybe-should-be-conf&amp;gt;/files/packages.288307a0-2e12-11ed-aba7-00163eb9bf17.tar with timestamp 1662489105379&lt;/P&gt;&lt;P&gt;22/09/06 19:06:52 ERROR Executor: Exception in task 4.1 in stage 132.0 (TID 1034)&lt;/P&gt;&lt;P&gt;org.apache.spark.SparkException: Failed to fetch spark://&amp;lt;some-ip-that-maybe-should-be-conf&amp;gt;/files/packages.288307a0-2e12-11ed-aba7-00163eb9bf17.tar during dependency update&lt;/P&gt;&lt;P&gt;at org.apache.spark.executor.Executor.$anonfun$updateDependencies$4(Executor.scala:1357)&lt;/P&gt;&lt;P&gt;at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:985)&lt;/P&gt;&lt;P&gt;at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:400)&lt;/P&gt;&lt;P&gt;at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:728)&lt;/P&gt;&lt;P&gt;at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:728)&lt;/P&gt;&lt;P&gt;at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:984)&lt;/P&gt;&lt;P&gt;at org.apache.spark.executor.Executor.updateDependencies(Executor.scala:1348)&lt;/P&gt;&lt;P&gt;at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:807)&lt;/P&gt;&lt;P&gt;at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)&lt;/P&gt;&lt;P&gt;at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)&lt;/P&gt;&lt;P&gt;at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:728)&lt;/P&gt;&lt;P&gt;at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)&lt;/P&gt;&lt;P&gt;at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)&lt;/P&gt;&lt;P&gt;at java.lang.Thread.run(Thread.java:748)&lt;/P&gt;&lt;P&gt;Caused by: java.lang.RuntimeException: Stream '/files/packages.288307a0-2e12-11ed-aba7-00163eb9bf17.tar' was not found.&lt;/P&gt;&lt;P&gt;at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:260)&lt;/P&gt;&lt;P&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:142)&lt;/P&gt;&lt;P&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)&lt;/P&gt;&lt;P&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)&lt;/P&gt;&lt;P&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)&lt;/P&gt;&lt;P&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)&lt;/P&gt;&lt;P&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)&lt;/P&gt;&lt;P&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)&lt;/P&gt;&lt;P&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)&lt;/P&gt;&lt;P&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)&lt;/P&gt;&lt;P&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)&lt;/P&gt;&lt;P&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;/P&gt;&lt;P&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)&lt;/P&gt;&lt;P&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)&lt;/P&gt;&lt;P&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)&lt;/P&gt;&lt;P&gt;at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)&lt;/P&gt;&lt;P&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)&lt;/P&gt;&lt;P&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)&lt;/P&gt;&lt;P&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)&lt;/P&gt;&lt;P&gt;at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)&lt;/P&gt;&lt;P&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)&lt;/P&gt;&lt;P&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)&lt;/P&gt;&lt;P&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)&lt;/P&gt;&lt;P&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)&lt;/P&gt;&lt;P&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)&lt;/P&gt;&lt;P&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)&lt;/P&gt;&lt;P&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)&lt;/P&gt;&lt;P&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)&lt;/P&gt;&lt;P&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)&lt;/P&gt;&lt;P&gt;at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)&lt;/P&gt;&lt;P&gt;at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)&lt;/P&gt;&lt;P&gt;... 1 more&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I've seen something similar posted here: &lt;A href="https://stackoverflow.com/questions/58933492/databricks-spark-error-dependency-update" alt="https://stackoverflow.com/questions/58933492/databricks-spark-error-dependency-update" target="_blank"&gt;https://stackoverflow.com/questions/58933492/databricks-spark-error-dependency-update&lt;/A&gt; , but that seems to be resolved, and looks irrelevant here.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Does anyone know how I can debug what I might have screwed up that Databricks wouldn't be able to find the packages I need to copy to the worker nodes? Or how I might go about debugging what's wrong here?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks in advance, and please let me know if there's anything I can provide for better debugging, or if there's a better forum to post this on!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'm on Databricks Runtime: 11.1 ML, with Spark 3.3.0 and Scala 2.12&lt;/P&gt;</description>
      <pubDate>Tue, 06 Sep 2022 19:25:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/understanding-dependency-update-failure/m-p/32423#M23623</guid>
      <dc:creator>Brendon_Daugher</dc:creator>
      <dc:date>2022-09-06T19:25:15Z</dc:date>
    </item>
  </channel>
</rss>

