- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-11-2024 01:45 AM
Here's I how would do it, leveraging the medallion architecture :
- Instead of a temporary table, load everything in a bronze table in append mode. This table should be liquid-clustered, and you should run OPTIMIZE on it regularly. It should also have CDF enabled.
- Next, from this table fetch only the new rows using change datafeed. Something like SELECT * FROM table_changes('bronze_table', last_commit_version)
- Run the MERGE into the silver table, which in your case would be the full updated table, that should also be with liquid clustering, and also with an OPTIMIZE statement scheduled regularly.
This way both your tables are optimized, you keep all of the history in your bronze table but it remains efficient because you're loading only what's necessary with CDF.
It will remain efficient as long as you don't forget to OPTIMIZE. Also note that with liquid clustering, OPTIMIZE is incremental, so it only optimized what's new, so don't hesitate to run that statement often !