cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Incremental updates to s3 csv files, autoloader, and delta lake updates

lprevost
New Contributor II

I'm using the Databricks autoloader to incrementally load a series of csv files on s3 which I update with an API. My tyipcal work process is to update only the latest year file each night. But, there are ocassions where previous years also get updated when there are updates to previous year records. In that case, I write over the CSV file for that year.

I'm following this guide:

Is there a way to trigger autoloader to see that previous year update? It works for me that when I add a new file (year), autoloader ingests it into my lake. But, when a previously ingested file gets updated with the same file name, it does not appear to do so. My assumption is that autoloader sees this file as already being injested (via filename heuristic), it ignores it as already been ingested.

Is there a way to trigger incremental via "update date" or some other method?

Am considering starting down the path of file notification services (SQS/SNS) to trigger the incremental file injestion.

any help on which path to use would be appreciated.

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @ lprevost! My name is Kaniz, and I'm a technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.