cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Error ingesting zip files: ExecutorLostFailure Reason: Command exited with code 50

nikhilmb
New Contributor II

Hi,

We are trying to ingest zip files into Azure Databricks delta lake using COPY INTO command. 

There are 100+ zip files with average size of ~300MB each.

Cluster configuration:

  • 1 driver: 56GB, 16 cores
  • 2-8 workers: 32GB, 8 cores (each). Autoscaling enabled.

    Following Spark parameters set at cluster level:

    • spark.default.parallelism 150
    • spark.executor.memory 30g

      Following Spark parameters set at the notebook level (while running the COPY INTO command).

spark = SparkSession.builder.appName("YourApp").config("spark.sql.execution.arrow.enabled", "true").config("spark.sql.execution.arrow.maxRecordsPerBatch", "100").config("spark.databricks.io.cache.maxFileSize", "2G").config("spark.network.timeout", "1000s").config("spark.driver.maxResultSize","2G").getOrCreate()

We are consistently getting the following error while trying to ingest the zip files:

Job aborted due to stage failure: Task 77 in stage 33.0 failed 4 times, most recent failure: Lost task 77.3 in stage 33.0 (TID 1667) (10.139.64.12 executor 20): ExecutorLostFailure (executor 20 exited caused by one of the running tasks) Reason: Command exited with code 50 The error stack looks like this:Py4JJavaError: An error occurred while calling o360.sql. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 77 in stage 33.0 failed 4 times, most recent failure: Lost task 77.3 in stage 33.0 (TID 1667) (10.139.64.12 executor 20): ExecutorLostFailure (executor 20 exited caused by one of the running tasks) Reason: Command exited with code 50

Driver stacktrace: at

org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3628) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:3559) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:3546) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:3546) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1521) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1521) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1521) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3875) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3787) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3775) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:51) at org.apache.spark.scheduler.DAGScheduler.$anonfun$runJob$1(DAGScheduler.scala:1245) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV

This works for less number of zip files (upto 20). Even this was not working with default cluster configuration. We had to increase driver and worker config and increase parallelism and executor memory options at cluster level as mentioned above. Now this higher config is failing when trying to ingest more zip files. We ideally don't wish to increase the cluster config any further as that's not the optimal solution and number of files can keep increasing. 

Please advise.

CC: @Anup

 

 

3 REPLIES 3

nikhilmb
New Contributor II

Thanks for the response.

We tried all the suggestions in the post. It's still failing.

I think Spark tries to unzip files during ingestion and that's where it goes out of memory. May be ingesting zip files is not supported yet. We are now exploring the Unity Catalog Volume option to ingest zip files and access them in the delta lake.

nikhilmb
New Contributor II

Just in the hope that this might benefit other users, we have decided to go for the good-old way of mounting cloud object store onto DBFS and then ingesting data from mounted drive into Unity Catalog-managed volume. Tried this for the 500+ zip files and it is working as expected.

nikhilmb
New Contributor II

Although we were able to copy the zip files onto the DB volume, we were not able to share them with any system outside of the Databricks environment. Guess delta sharing does not support sharing files that are on UC volumes.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group