cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Getting FileNotFoundException while using cloudFiles

dannythermadom
New Contributor III

Hi,
Following is the code i am using the ingest the data incrementally (weekly).

val ssdf = spark.readStream.schema(schema
.format("cloudFiles")
.option("cloudFiles.format", "parquet")
.load(sourceUrl)
.filter(criteriaFilter)

val transformedDf = ssdf.transform(.....)

val processData = transformedDf
.select(recordFields: _*
.writeStream
.option("checkpointLocation", outputUrl + "checkpoint/")
.format("parquet")
.outputMode("append")
.option("path", outputUrl + run_id + "/"
.trigger(Trigger.Once())
.start()
processData.processAllAvailable()
processData.stop()

For each week, the data is written to a new folder and checkpoint to the same folder.
This worked fine for 3 to 5 incremental run.
But recently i got the following error :  
ERROR: Query termination received for [id=2345245425], with exception: org.apache.spark.SparkException: Job aborted.
Caused by: java.io.FileNotFoundException: Unable to find batch s3://outputPath/20230810063959/_spark_metadata/0
What is the reason for this issue ? Any idea?
2 REPLIES 2

BilalAslamDbrx
Databricks Employee
Databricks Employee

Danny is another process mutating / deleting the incoming files?

 

New files gets added to the input location. Input files are not deleted or updated .. 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group