Databricks Community

aonurdemir · ‎10-27-2025

Suddenly at "2025-10-23T14:12:48.409+00:00", coming file paths from file notification queue started to be urlencoded. Hence, our pipeline gets file not found exception. I think something has changed suddenly and broke notification system. Here are the details:

Broken file path started to come from notification queue:

my-sink/prod/app_daily/year%3D2025/month%3D10/day%3D23/app_daily%2B5%2B35499048368.json.gz

The path discovered by directory listing:

my-sink/prod/app_daily/year=2025/month=10/day=23/app_daily+5+35499048368.json.gz

I found these by investigating the output of these query:

select * from cloud_files_state(TABLE(my_catalog.my_schema.app_daily_stream_v2))
order by create_time asc;

since (=) characted encoded as (%3D), our Declarative Pipeline fires this error:

org.apache.spark.sql.streaming.StreamingQueryException: [STREAM_FAILED] Query [id = 76ce493e-ed1e-48ce-bbda-1a9bb85cc9f7, runId = bee6661b-4319-469c-8a72-040dc517e9ff] terminated with exception: Exception thrown in awaitResult: [FAILED_READ_FILE.DBR_FILE_NOT_EXIST] Error while reading file s3://my-sink/prod/app_daily/year%3D2025/month%3D10/day%3D23/app_daily%2B5%2B35499048368.json.gz. [CLOUD_FILE_SOURCE_FILE_NOT_FOUND] A file notification was received for file:s3://my-sink/prod/app_daily/year%3D2025/month%3D10/day%3D23/app_daily%2B5%2B35499048368.json.gz. but it does not exist anymore. Please ensure that files are not deleted before they are processed. To continue your stream, you can set the Spark SQL configuration spark.sql.files.ignoreMissingFiles to true.

I checked s3 bucket and saw that the file is there. Since autoloader try to go to the encoded path, it started to fire this error.

My problem is this: Currently, I cannot run the pipeline neither in file notification mode nor in directory listing mode without using skipMissingFiles=true option since auto loader state is dirty with these uncommitted wrong file paths. I don't want to useSkipMissingFiles since it will skip all the data. Also, I do not want to run full refresh since the source is too big. I need to clear those broken urls from the autoloader's state.

K_Anudeep · ‎10-29-2025

Hello @aonurdemir,

Could you please re-run your pipeline now and check? This issue should be mitigated now. It is due to a recent internal bug that led to the unexpected handling of file paths with special characters.

You should set ignoreMissingFiles to true to get past this error and you can remove the flag once you get past this error

Anudeep

View solution in original post

K_Anudeep · ‎10-29-2025

Hello @aonurdemir,

Could you please re-run your pipeline now and check? This issue should be mitigated now. It is due to a recent internal bug that led to the unexpected handling of file paths with special characters.

You should set ignoreMissingFiles to true to get past this error and you can remove the flag once you get past this error

Anudeep

RevanthV · ‎11-03-2025

Hey @K_Anudeep ,

Thanks for letting us know that this is a bug and is mitigated. I tried testing it today again as i was getting the same error last week but it doesn't occur now anymore

aonurdemir · ‎11-04-2025

Hello @K_Anudeep ,

As I mentioned, we realized that the special character encoding and decoding was broken. So we had solved the issue as the following by changing the path string in autoloader:

old path: s3://my-sink/prod/app_daily/*/*/*/*.json.gz

new path: s3://my-sink/prod/app_daily/year=*/month=*/day=*/*.json.gz

After changing the path string, a clear run with the option ignoreMissingFiles=true cleared the state. After, we removed the option and pipeline continued to run successfully.

Regardless, thanks for your clear answer.

PS: I do not know our path string was the best regarding the conventions but it was working anyway. If you can suggest better and performant string formats, I appreciate any help. Thanks.

Databricks Community

Broken s3 file paths in File Notifications for auto loader

Join Us as a Local Community Builder!

PSA: Community Edition retires on January 1, 2026. Move to the Free Edition today to keep your work.

🎤 Call for Presentations: Data + AI Summit 2026 is Open!

Last Chance: Help Shape the 2026 Data + AI Summit | Win a Full Conference Pass

🌟 Community Pulse: Your Weekly Roundup! December 05 – 11, 2025

Celebrating Our First Brickster Champion: Louis Frolio