Getting FileNotFoundException while using cloudFiles

dannythermadom — Tue, 05 Sep 2023 11:14:42 GMT

Hi,
Following is the code i am using the ingest the data incrementally (weekly).

val ssdf = spark.readStream.schema(schema)

.format("cloudFiles")

.option("cloudFiles.format", "parquet")

.load(sourceUrl)

.filter(criteriaFilter)

val transformedDf = ssdf.transform(.....)

val processData = transformedDf

.select(recordFields: _*)

.writeStream

.option("checkpointLocation", outputUrl + "checkpoint/")

.format("parquet")

.outputMode("append")

.option("path", outputUrl + run_id + "/")

.trigger(Trigger.Once())

.start()

processData.processAllAvailable()

processData.stop()

For each week, the data is written to a new folder and checkpoint to the same folder.
This worked fine for 3 to 5 incremental run.
But recently i got the following error :
ERROR: Query termination received for [id=2345245425], with exception: org.apache.spark.SparkException: Job aborted.
Caused by: java.io.FileNotFoundException: Unable to find batch s3://outputPath/20230810063959/_spark_metadata/0
What is the reason for this issue ? Any idea?

Re: Getting FileNotFoundException while using cloudFiles

BilalAslamDbrx — Tue, 05 Sep 2023 13:05:02 GMT

Danny is another process mutating / deleting the incoming files?

Re: Getting FileNotFoundException while using cloudFiles

dannythermadom — Tue, 05 Sep 2023 13:08:46 GMT

New files gets added to the input location. Input files are not deleted or updated ..

topic Re: Getting FileNotFoundException while using cloudFiles in Get Started Discussions

Getting FileNotFoundException while using cloudFiles

Re: Getting FileNotFoundException while using cloudFiles

Re: Getting FileNotFoundException while using cloudFiles