Autoloader file latency
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-30-2024 08:27 AM
Hi Team,
I would like to understand if there is a metadata table for the autoloader in Databricks that captures information about file arrival and processing.
The reason we are experiencing data issues is because our table A receives hundreds of files that are processed by an autoloader,
and in some scenarios, we have noticed that old files are processed after new files, possibly due to a problem in the source system.
However, if we have clear details about autoloder metadata, it will be easier to identify the root cause analysis.
could you please share the best practices for organizing data in a storage location that an autoloader can effectively process?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-30-2024 04:06 PM
Are you using file listing or file notification for auto loader?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-30-2024 08:58 PM
we are using the default.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-31-2024 11:10 AM
Check with
cloud_files_state() API
You can find examples here https://docs.databricks.com/en/ingestion/auto-loader/production.html#querying-files-discovered-by-au...