- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-06-2021 03:12 AM
Hello,
We can use Autoloader to track the files that have been loaded from S3 bucket or not. My question about Autoloader: is there a way to read the Autoloader database to get the list of files that have been loaded?
I can easily do this in AWS Glue job bookmark, but I'm not aware on how to do this in Databricks Autoloader.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-06-2021 03:25 AM
.load("path")
.withColumn("filePath",input_file_name())
than you can for example insert filePath to your stream sink and than get distinct value from there or use forEatch / forEatchBatch and for example insert it into spark sql table
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-06-2021 03:25 AM
.load("path")
.withColumn("filePath",input_file_name())
than you can for example insert filePath to your stream sink and than get distinct value from there or use forEatch / forEatchBatch and for example insert it into spark sql table
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-09-2021 07:55 AM
Thank you! This works for me 🙏
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-01-2024 02:03 PM
More efficient way
SELECT * FROM cloud_files_state('path/to/checkpoint');
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-09-2021 08:12 AM
@Herry Ramli - Would you be happy to mark Hubert's answer as best so that other members can find the solution more easily?
Thanks!