cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Get the list of loaded files from Autoloader

herry
New Contributor III

Hello,

We can use Autoloader to track the files that have been loaded from S3 bucket or not. My question about Autoloader: is there a way to read the Autoloader database to get the list of files that have been loaded?

I can easily do this in AWS Glue job bookmark, but I'm not aware on how to do this in Databricks Autoloader.

1 ACCEPTED SOLUTION

Accepted Solutions

Hubert-Dudek
Esteemed Contributor III
  .load("path")
  .withColumn("filePath",input_file_name())

than you can for example insert filePath to your stream sink and than get distinct value from there or use forEatch / forEatchBatch and for example insert it into spark sql table

View solution in original post

3 REPLIES 3

Hubert-Dudek
Esteemed Contributor III
  .load("path")
  .withColumn("filePath",input_file_name())

than you can for example insert filePath to your stream sink and than get distinct value from there or use forEatch / forEatchBatch and for example insert it into spark sql table

herry
New Contributor III

Thank you! This works for me 🙏

Anonymous
Not applicable

@Herry Ramli​ - Would you be happy to mark Hubert's answer as best so that other members can find the solution more easily?

Thanks!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.