- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-19-2023 05:27 AM
Autoloader keeps track of files yeah so that it only reads them once to prevent duplicates. If you do a count before and after autoloader each time youll see that it only adds new data. Now do you have a @timestamp column? Im not sure what your logic looks like in the pipeline but if you have a timestamp or date_pulled column you can filter the pipeline query to grab the data thar doesnt exist yet in the next table in the pipeline by checking it for the last timestamp/date_pilled data. But if you just grab all the data into a dataframe you can just do an upsert to the new table which will update existing records (if you want) and insert new ones. I can only speculate what your logic looks like though without more info 🙂