cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How Auto Loader works โ€“ file level or row level?

Akshay_Petkar
Contributor III

Does Auto Loader work on file level or row level? If it works on file level and does not process the same file again, then how can we make it pick only the new rows when data is appended to that file?

Akshay Petkar
1 ACCEPTED SOLUTION

Accepted Solutions

szymon_dybczak
Esteemed Contributor III

Hi @Akshay_Petkar ,

Autoloader works on file level. Now, by default autoloader is configured with following option:

cloudFiles.allowOverwrites = false

So, above option causes files to be processed exactly once. 

But when you switch this option to true, then  Auto Loader is guaranteed to process the latest version of the file. But keep in mind that autloader will reprocess entire file (even if there was partial update).
You can read detail description of this behaviour here:

Auto Loader FAQ - Azure Databricks | Microsoft Learn

View solution in original post

1 REPLY 1

szymon_dybczak
Esteemed Contributor III

Hi @Akshay_Petkar ,

Autoloader works on file level. Now, by default autoloader is configured with following option:

cloudFiles.allowOverwrites = false

So, above option causes files to be processed exactly once. 

But when you switch this option to true, then  Auto Loader is guaranteed to process the latest version of the file. But keep in mind that autloader will reprocess entire file (even if there was partial update).
You can read detail description of this behaviour here:

Auto Loader FAQ - Azure Databricks | Microsoft Learn