cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Incremental updates to s3 csv files, autoloader, and delta lake updates

lprevost
Contributor

I'm using the Databricks autoloader to incrementally load a series of csv files on s3 which I update with an API. My tyipcal work process is to update only the latest year file each night. But, there are ocassions where previous years also get updated when there are updates to previous year records. In that case, I write over the CSV file for that year.

I'm following this guide:

Is there a way to trigger autoloader to see that previous year update? It works for me that when I add a new file (year), autoloader ingests it into my lake. But, when a previously ingested file gets updated with the same file name, it does not appear to do so. My assumption is that autoloader sees this file as already being injested (via filename heuristic), it ignores it as already been ingested.

Is there a way to trigger incremental via "update date" or some other method?

Am considering starting down the path of file notification services (SQS/SNS) to trigger the incremental file injestion.

any help on which path to use would be appreciated.

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group