cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Few queries on Autoloader

Digvijay_11
New Contributor
  1. How to retrieve filename and file path from the trigger and consume in Databricks Notebook dynamically
  2. If the same file is being modified with no change in name but in data then will this trigger work? If not what is the walkaround?
  3. In landing we are getting file placed in with same name. That is same file is being overwritten. If we use allow Overwrites option to True will autoloader only takes care of the modified data as an incremental way or all the files are reprocessed? How to process incremental ingestion using Autoloader when same file is being modified. #Autoloader
1 REPLY 1

szymon_dybczak
Esteemed Contributor III

Hi @Digvijay_11 ,

1. You can use metadata column for that purpose File metadata column - Azure Databricks | Microsoft Learn

2. With the default setting (cloudFiles.allowOverwrites = false), files are processed exactly once. When a file is appended to or overwritten, Auto Loader cannot guarantee which file version will be processed. To allow Auto Loader to process the file again when it is appended to or overwritten, you can set cloudFiles.allowOverwrites to true. In this case, Auto Loader is guaranteed to process the latest version of the file. However, Auto Loader cannot guarantee which intermediate version is processed.

3. With cloudFiles.allowOverwrites = true, Auto Loader will reprocess the entire file even when it is appended or partially updated.

So:

  • It’s not “incremental diffs” at file content level.

  • It ingests the latest full file again whenever modification is detected.