Solved: Databricks Clusters on GCP stop working "Environme... - Databricks Community - 20230

Register to join the community

Data Engineering

Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.

Starting from yesterday 17/5/2022 i start getting errors while running notebooks or jobs on clusters of Databricks GCP.

The error is:

SparkException: Environment directory not found at /local_disk0/.ephemeral_nfs/cluster_libraries/python

The job/notebooks can do some of the operations but some of the operations like:

display(dbutils.fs.ls("/%s" % mount_name))

I tried to start a new cluster. I tried to reduce any init scripts.

The full error:

22/05/18 05:30:09 WARN TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3) (10.71.1.3 executor 0): org.apache.spark.SparkException: Environment directory not found at /local_disk0/.ephemeral_nfs/cluster_libraries/python

at org.apache.spark.util.DatabricksUtils$.waitForEnvironmentFileSystem(DatabricksUtils.scala:685)

at org.apache.spark.api.python.PythonWorkerFactory.$anonfun$startDaemon$1(PythonWorkerFactory.scala:273)

at org.apache.spark.api.python.PythonWorkerFactory.$anonfun$startDaemon$1$adapted(PythonWorkerFactory.scala:273)

at scala.Option.foreach(Option.scala:407)

at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:273)

at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:185)

at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:134)

at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:209)

at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:251)

at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:77)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:380)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:344)

at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:380)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:344)

at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:380)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:344)

at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:380)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:344)

at org.apache.spark.sql.execution.SQLExecutionRDD.$anonfun$compute$1(SQLExecutionRDD.scala:57)

at org.apache.spark.sql.internal.SQLConf$.withExistingConf(SQLConf.scala:170)

at org.apache.spark.sql.execution.SQLExecutionRDD.compute(SQLExecutionRDD.scala:57)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:380)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:344)

at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:380)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:344)

at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:75)

at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)

at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:75)

at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:55)

at org.apache.spark.scheduler.Task.doRunTask(Task.scala:156)

at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:125)

at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)

at org.apache.spark.scheduler.Task.run(Task.scala:95)

at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:826)

at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1670)

at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:829)

at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)

at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:684)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

1 ACCEPTED SOLUTION

Accepted Solutions

Databricks supports detected an issue with the NFS mounts on GCP.

Looks like DBR 10.X versions were affected.

After several hours they fixed it and now the same clusters are back to normal.

View solution in original post

1 REPLY 1

Databricks supports detected an issue with the NFS mounts on GCP.

Looks like DBR 10.X versions were affected.

After several hours they fixed it and now the same clusters are back to normal.

Photos

Upload Upload
URL URL
Saved Photos Saved Photos

Upload location

Upload location

Add Photos to Album:

New Album

Drag here to start uploading

Drag photos here or

Tap for upload options

You must install or upgrade to the latest version of Adobe Flash Player before you can upload images.

You must be signed in to add attachments

never-displayed

Announcements

Business Intelligence in the Era of AI

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Databricks Community Champion - March 2025 - Takuya Omi

Intelligent Data Warehousing: AI/BI for Self-service Analytics

Get Started With Lakehouse Architecture | Pass a quiz to earn your certificate completion.

upload