Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a month ago
Never delete or alter files inside a checkpoint directory manually as it will corrupt the auto loader streams.
Auto Loader keeps track of discovered files in the checkpoint location using Rocks DB to provide exactly once ingestion guarantees.
- You can upgrade to Databricks Runtime 17 or above for high volume or long-lived ingestion streams.
- You can control the size using the cloudFiles.maxFileAge option to expire file events that are older than a particular period. You can keep it to 30 days if possible.
- You can use Auto Loader’s cleanSource option. This deletes or archives the source files after they are successfully processed