- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-06-2025 06:30 PM
Hi @Ranga_naik1180, let's take an example to understand this:
Flow of the pipeline: Bronze -> Silver -> Gold
In Storage: You have 2 files, 1.json is the original file, and 2.json is updating the value of b in the new file from b to b_new.
1.json -> {A: “a”, B: “b”}
2.json -> {A: “a”, B: “b_new”}
Then, changeFeed is a solution
_delta_log -> This will have the entries as updates for the updated entry, but _change_feed will take that as an insert operation. Since the streaming supports append-only, that's why we need to read from the _change_feed because it has both operations as INSERT.
Abc1.parquet
{A: “a”, B: “b”} (INSERT)
Abc2.parquet
{A: “a”, B: “b_new”} (UPDATE)
_change_feed
abc1_change.parquet (INSERT)
{A: “a”, B: “b”, “change_type”: “INSERT”}
abc2_change.parquet (INSERT)
{A: “a”, B: “b”, “change_type”: “UPDATE”, “change_image”: “pre_image”}
{A: “a”, B: “b_new”, “change_type”: “UPDATE”, “change_image”: “post_image”}