topic Databricks Clusters on GCP stop working "Environment directory not found" issue - waitForEnvironmentFileSystem in Data Engineering

topic Databricks Clusters on GCP stop working "Environment directory not found" issue - waitForEnvironmentFileSystem in Data Engineering https://community.databricks.com/t5/data-engineering/databricks-clusters-on-gcp-stop-working-quot-environment/m-p/20230#M13632 <P>Starting from yesterday 17/5/2022 i start getting errors while running notebooks or jobs on clusters of Databricks GCP. </P><P></P><P><U>The error is: </U></P><P>SparkException: Environment directory not found at /local_disk0/.ephemeral_nfs/cluster_libraries/python</P><P></P><P>The job/notebooks can do some of the operations but some of the operations like: </P><P>display(dbutils.fs.ls("/%s" % mount_name))</P><P></P><P>I tried to start a new cluster. I tried to reduce any init scripts. </P><P></P><P><U>The full error: </U></P><P>22/05/18 05:30:09 WARN TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3) (10.71.1.3 executor 0): org.apache.spark.SparkException: Environment directory not found at /local_disk0/.ephemeral_nfs/cluster_libraries/python</P><P> at org.apache.spark.util.DatabricksUtils$.waitForEnvironmentFileSystem(DatabricksUtils.scala:685)</P><P> at org.apache.spark.api.python.PythonWorkerFactory.$anonfun$startDaemon$1(PythonWorkerFactory.scala:273)</P><P> at org.apache.spark.api.python.PythonWorkerFactory.$anonfun$startDaemon$1$adapted(PythonWorkerFactory.scala:273)</P><P> at scala.Option.foreach(Option.scala:407)</P><P> at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:273)</P><P> at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:185)</P><P> at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:134)</P><P> at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:209)</P><P> at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:251)</P><P> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:77)</P><P> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:380)</P><P> at org.apache.spark.rdd.RDD.iterator(RDD.scala:344)</P><P> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)</P><P> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:380)</P><P> at org.apache.spark.rdd.RDD.iterator(RDD.scala:344)</P><P> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)</P><P> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:380)</P><P> at org.apache.spark.rdd.RDD.iterator(RDD.scala:344)</P><P> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)</P><P> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:380)</P><P> at org.apache.spark.rdd.RDD.iterator(RDD.scala:344)</P><P> at org.apache.spark.sql.execution.SQLExecutionRDD.$anonfun$compute$1(SQLExecutionRDD.scala:57)</P><P> at org.apache.spark.sql.internal.SQLConf$.withExistingConf(SQLConf.scala:170)</P><P> at org.apache.spark.sql.execution.SQLExecutionRDD.compute(SQLExecutionRDD.scala:57)</P><P> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:380)</P><P> at org.apache.spark.rdd.RDD.iterator(RDD.scala:344)</P><P> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)</P><P> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:380)</P><P> at org.apache.spark.rdd.RDD.iterator(RDD.scala:344)</P><P> at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:75)</P><P> at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)</P><P> at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:75)</P><P> at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)</P><P> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:55)</P><P> at org.apache.spark.scheduler.Task.doRunTask(Task.scala:156)</P><P> at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:125)</P><P> at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)</P><P> at org.apache.spark.scheduler.Task.run(Task.scala:95)</P><P> at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:826)</P><P> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1670)</P><P> at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:829)</P><P> at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)</P><P> at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)</P><P> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:684)</P><P> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)</P><P> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)</P><P> at java.lang.Thread.run(Thread.java:748)</P><P></P> Wed, 18 May 2022 16:45:12 GMT 720677 2022-05-18T16:45:12Z Databricks Clusters on GCP stop working "Environment directory not found" issue - waitForEnvironmentFileSystem https://community.databricks.com/t5/data-engineering/databricks-clusters-on-gcp-stop-working-quot-environment/m-p/20230#M13632 <P>Starting from yesterday 17/5/2022 i start getting errors while running notebooks or jobs on clusters of Databricks GCP. </P><P></P><P><U>The error is: </U></P><P>SparkException: Environment directory not found at /local_disk0/.ephemeral_nfs/cluster_libraries/python</P><P></P><P>The job/notebooks can do some of the operations but some of the operations like: </P><P>display(dbutils.fs.ls("/%s" % mount_name))</P><P></P><P>I tried to start a new cluster. I tried to reduce any init scripts. </P><P></P><P><U>The full error: </U></P><P>22/05/18 05:30:09 WARN TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3) (10.71.1.3 executor 0): org.apache.spark.SparkException: Environment directory not found at /local_disk0/.ephemeral_nfs/cluster_libraries/python</P><P> at org.apache.spark.util.DatabricksUtils$.waitForEnvironmentFileSystem(DatabricksUtils.scala:685)</P><P> at org.apache.spark.api.python.PythonWorkerFactory.$anonfun$startDaemon$1(PythonWorkerFactory.scala:273)</P><P> at org.apache.spark.api.python.PythonWorkerFactory.$anonfun$startDaemon$1$adapted(PythonWorkerFactory.scala:273)</P><P> at scala.Option.foreach(Option.scala:407)</P><P> at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:273)</P><P> at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:185)</P><P> at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:134)</P><P> at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:209)</P><P> at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:251)</P><P> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:77)</P><P> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:380)</P><P> at org.apache.spark.rdd.RDD.iterator(RDD.scala:344)</P><P> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)</P><P> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:380)</P><P> at org.apache.spark.rdd.RDD.iterator(RDD.scala:344)</P><P> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)</P><P> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:380)</P><P> at org.apache.spark.rdd.RDD.iterator(RDD.scala:344)</P><P> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)</P><P> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:380)</P><P> at org.apache.spark.rdd.RDD.iterator(RDD.scala:344)</P><P> at org.apache.spark.sql.execution.SQLExecutionRDD.$anonfun$compute$1(SQLExecutionRDD.scala:57)</P><P> at org.apache.spark.sql.internal.SQLConf$.withExistingConf(SQLConf.scala:170)</P><P> at org.apache.spark.sql.execution.SQLExecutionRDD.compute(SQLExecutionRDD.scala:57)</P><P> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:380)</P><P> at org.apache.spark.rdd.RDD.iterator(RDD.scala:344)</P><P> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)</P><P> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:380)</P><P> at org.apache.spark.rdd.RDD.iterator(RDD.scala:344)</P><P> at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:75)</P><P> at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)</P><P> at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:75)</P><P> at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)</P><P> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:55)</P><P> at org.apache.spark.scheduler.Task.doRunTask(Task.scala:156)</P><P> at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:125)</P><P> at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)</P><P> at org.apache.spark.scheduler.Task.run(Task.scala:95)</P><P> at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:826)</P><P> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1670)</P><P> at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:829)</P><P> at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)</P><P> at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)</P><P> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:684)</P><P> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)</P><P> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)</P><P> at java.lang.Thread.run(Thread.java:748)</P><P></P> Wed, 18 May 2022 16:45:12 GMT https://community.databricks.com/t5/data-engineering/databricks-clusters-on-gcp-stop-working-quot-environment/m-p/20230#M13632 720677 2022-05-18T16:45:12Z Re: Databricks Clusters on GCP stop working "Environment directory not found" issue - waitForEnvironmentFileSystem https://community.databricks.com/t5/data-engineering/databricks-clusters-on-gcp-stop-working-quot-environment/m-p/20232#M13634 <P>Databricks supports detected an issue with the NFS mounts on GCP. </P><P>Looks like DBR 10.X versions were affected. </P><P>After several hours they fixed it and now the same clusters are back to normal. </P> Thu, 19 May 2022 07:15:53 GMT https://community.databricks.com/t5/data-engineering/databricks-clusters-on-gcp-stop-working-quot-environment/m-p/20232#M13634 720677 2022-05-19T07:15:53Z