etsyal1e2r3
Honored Contributor

Autoloader keeps track of files yeah so that it only reads them once to prevent duplicates. If you do a count before and after autoloader each time youll see that it only adds new data. Now do you have a @timestamp column? Im not sure what your logic looks like in the pipeline but if you have a timestamp or date_pulled column you can filter the pipeline query to grab the data thar doesnt exist yet in the next table in the pipeline by checking it for the last timestamp/date_pilled data. But if you just grab all the data into a dataframe you can just do an upsert to the new table which will update existing records (if you want) and insert new ones. I can only speculate what your logic looks like though without more info 🙂