Databricks Community

herry · ‎12-06-2021

Hello,

We can use Autoloader to track the files that have been loaded from S3 bucket or not. My question about Autoloader: is there a way to read the Autoloader database to get the list of files that have been loaded?

I can easily do this in AWS Glue job bookmark, but I'm not aware on how to do this in Databricks Autoloader.

Hubert-Dudek · ‎12-06-2021

  .load("path")
  .withColumn("filePath",input_file_name())

than you can for example insert filePath to your stream sink and than get distinct value from there or use forEatch / forEatchBatch and for example insert it into spark sql table

My blog: https://databrickster.medium.com/

View solution in original post

Hubert-Dudek · ‎12-06-2021

  .load("path")
  .withColumn("filePath",input_file_name())

than you can for example insert filePath to your stream sink and than get distinct value from there or use forEatch / forEatchBatch and for example insert it into spark sql table

My blog: https://databrickster.medium.com/