- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-05-2025 07:40 AM
Hi @szymon_dybczak, you bring up some great points. What I'd like to narrow a bit more on is the "data duplication" side of the conversation. In the past, team members have brought up concerns with data duplication and the added complexity of having "extra steps" in our data ingestion process, so I want to make sure that we are properly addressing these points when considering using (or not) AutoLoader.
In your experience, how much of an impact in terms of cost would this data duplication have? And is the added complexity (extra steps) really worth it? As a counter-example, the code I shared would effectively load API data directly into a raw delta table and removes the need to land the data directly in their original .json files. The only downside I see to this approach would be that we don't have separation of concerns (a valid point), but if we chose to do it this way, I think I see basically the same benefits as with leveraging AutoLoader.
Please let me know if I missed anything!