Databricks Community

aonurdemir · a month ago

Suddenly at "2025-10-23T14:12:48.409+00:00", coming file paths from file notification queue started to be urlencoded. Hence, our pipeline gets file not found exception. I think something has changed suddenly and broke notification system. Here are the details:

Broken file path started to come from notification queue:

my-sink/prod/app_daily/year%3D2025/month%3D10/day%3D23/app_daily%2B5%2B35499048368.json.gz

The path discovered by directory listing:

my-sink/prod/app_daily/year=2025/month=10/day=23/app_daily+5+35499048368.json.gz

I found these by investigating the output of these query:

select * from cloud_files_state(TABLE(my_catalog.my_schema.app_daily_stream_v2))
order by create_time asc;

since (=) characted encoded as (%3D), our Declarative Pipeline fires this error:

org.apache.spark.sql.streaming.StreamingQueryException: [STREAM_FAILED] Query [id = 76ce493e-ed1e-48ce-bbda-1a9bb85cc9f7, runId = bee6661b-4319-469c-8a72-040dc517e9ff] terminated with exception: Exception thrown in awaitResult: [FAILED_READ_FILE.DBR_FILE_NOT_EXIST] Error while reading file s3://my-sink/prod/app_daily/year%3D2025/month%3D10/day%3D23/app_daily%2B5%2B35499048368.json.gz. [CLOUD_FILE_SOURCE_FILE_NOT_FOUND] A file notification was received for file:s3://my-sink/prod/app_daily/year%3D2025/month%3D10/day%3D23/app_daily%2B5%2B35499048368.json.gz. but it does not exist anymore. Please ensure that files are not deleted before they are processed. To continue your stream, you can set the Spark SQL configuration spark.sql.files.ignoreMissingFiles to true.

I checked s3 bucket and saw that the file is there. Since autoloader try to go to the encoded path, it started to fire this error.

My problem is this: Currently, I cannot run the pipeline neither in file notification mode nor in directory listing mode without using skipMissingFiles=true option since auto loader state is dirty with these uncommitted wrong file paths. I don't want to useSkipMissingFiles since it will skip all the data. Also, I do not want to run full refresh since the source is too big. I need to clear those broken urls from the autoloader's state.

K_Anudeep · a month ago

Hello @aonurdemir,

Could you please re-run your pipeline now and check? This issue should be mitigated now. It is due to a recent internal bug that led to the unexpected handling of file paths with special characters.

You should set ignoreMissingFiles to true to get past this error and you can remove the flag once you get past this error

Anudeep

View solution in original post

K_Anudeep · a month ago

Hello @aonurdemir,

Could you please re-run your pipeline now and check? This issue should be mitigated now. It is due to a recent internal bug that led to the unexpected handling of file paths with special characters.

You should set ignoreMissingFiles to true to get past this error and you can remove the flag once you get past this error

Anudeep

RevanthV · 3 weeks ago

Hey @K_Anudeep ,

Thanks for letting us know that this is a bug and is mitigated. I tried testing it today again as i was getting the same error last week but it doesn't occur now anymore

aonurdemir · 3 weeks ago

Hello @K_Anudeep ,

As I mentioned, we realized that the special character encoding and decoding was broken. So we had solved the issue as the following by changing the path string in autoloader:

old path: s3://my-sink/prod/app_daily/*/*/*/*.json.gz

new path: s3://my-sink/prod/app_daily/year=*/month=*/day=*/*.json.gz

After changing the path string, a clear run with the option ignoreMissingFiles=true cleared the state. After, we removed the option and pipeline continued to run successfully.

Regardless, thanks for your clear answer.

PS: I do not know our path string was the best regarding the conventions but it was working anyway. If you can suggest better and performant string formats, I appreciate any help. Thanks.

Databricks Community

Broken s3 file paths in File Notifications for auto loader

Join Us as a Local Community Builder!

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples