How to detect gap in filenames (Autoloader)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-19-2024 06:55 AM
So my files arrive at the cloud storage and I have configured an autoloader to read these files.
The files have a monotonically increasing id in their name.
How can I detect a gap and stop the DLT as soon as there is a gap?
eg.
Autoloader finds file1, ingests
Autoloader finds file2, ingests
Autoloader finds file3, ingests
Autoloader finds file5 -> file4 is missing: STOP
Is this possible using DLT? Or should I go for a streaming job?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-12-2024 10:43 PM
It doesn't seem like this can be done through the DLT autoloader. Particularly you require an automatic stop without manual intervention. You can write a custom Structured Streaming job and use a sequence-checking logic, and foreachBatch
to process incoming files and detect missing IDs.

