nikhilj0421
Databricks Employee
Databricks Employee

Hi @Ranga_naik1180, let's take an example to understand this:

Flow of the pipeline: Bronze -> Silver -> Gold

In Storage: You have 2 files, 1.json is the original file, and 2.json is updating the value of b in the new file from b to b_new. 

1.json -> {A: “a”, B: “b”}

2.json -> {A: “a”, B: “b_new”}

Then, changeFeed is a solution

_delta_log -> This will have the entries as updates for the updated entry, but _change_feed will take that as an insert operation. Since the streaming supports append-only, that's why we need to read from the  _change_feed because it has both operations as INSERT.

Abc1.parquet

{A: “a”, B: “b”} (INSERT)

Abc2.parquet

{A: “a”, B: “b_new”} (UPDATE)

_change_feed

abc1_change.parquet (INSERT)

{A: “a”, B: “b”, “change_type”: “INSERT”}

abc2_change.parquet (INSERT)

{A: “a”, B: “b”, “change_type”: “UPDATE”, “change_image”: “pre_image”}

{A: “a”, B: “b_new”, “change_type”: “UPDATE”, “change_image”: “post_image”}