Databricks Community

fff_ds · ‎02-15-2022

org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in stage 26.0 failed 4 times, most recent failure: Lost task 19.3 in stage 26.0 (TID 4205, 10.66.225.154, executor 0): com.databricks.sql.io.FileReadException: Error while reading file s3://s3-datascience-prod/redshift/daily/raw/ds/product_information/date=2022-02-05/part-00002-d81f7a47-0421-42a7-9187-f421b0c734b9.c000.snappy.parquet. A file referenced in the transaction log cannot be found. This occurs when data has been manually deleted from the file system rather than using the table `DELETE` statement. For more information, see https://docs.databricks.com/delta/delta-intro.html#frequently-asked-questions

We are trying to copy a file (delta format) with known good properties over several days of known bad properties. We initially did this via S3 CLI but encountered issues with the above error. We tried then copying via dbutils and a databricks notebook over the same file paths and got the same error again. How can we reset this to a state where we don’t encounter the above error & our known good delta files are copied for every date in the date range?

Anonymous · ‎02-16-2022

Hello, @Lili Ehrlich. Welcome! My name is Piper, and I'm a moderator for Databricks. Thank you for bringing your question to us. Let's give it a while for the community to respond first.

Thanks in advance for your patience. 🙂

Databricks Community

Manual overwrite in s3 console of a collection of parquet files and now we can't read them.

Connect with Databricks Users in Your Area

Databricks Named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud Database Management Systems

Announcing the new Meta Llama 3.3 model on Databricks

Milestone: DatabricksTV Reaches 100 Videos!

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences

Databricks Community Champion - December 2024 - Sujesh Menon