cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Refresh delta

Fredolebeau80
New Contributor II

How refresh delta table with New raw from CDC Json file. 

2 REPLIES 2

Chevron
New Contributor II

Read the CDC JSON file containing new raw data, make all the necessary transformations and  load into an staging table in delta format and apply the changes to the target delta tables using appropriate merge operation or method from stagin

Vinay_M_R
Valued Contributor II
Valued Contributor II

To refresh a delta table with new raw data from a CDC JSON file, you can use change data capture (CDC) to update tables based on changes in source data. Here are the steps:1. Create a streaming table using the CREATE OR REFRESH STREAMING TABLE statement in SQL or the create_streaming_table() function in Python.
2. Use an APPLY CHANGES INTO statement to specify the source, keys, and sequencing for the change feed.
3. Use the APPLY CHANGES statement in SQL or the apply_changes() function in Python to create the statement defining the CDC processing.
4. Once you have the CDC data in a DataFrame, use the MERGE INTO statement to merge the data from the CDC table into the original Delta table.
Here is an example code snippet in Scala:
---------------------------------------------------------------------
%scala
val deltaTable = DeltaTable.forName("myDeltaTable")
val cdcDF = spark.read.json("path/to/cdc.json")
deltaTable.as("t")
.merge(cdcDF.as("s"), "s.key = t.key")
.whenMatched("s.deleted = true")
.delete()
.whenMatched()
.updateAll()
.whenNotMatched("s.deleted = false")
.insertAll()
.execute()
----------------------------------------------------------------------------
This code assumes that the CDC data is in a JSON file located at "path/to/cdc.json". It also assumes that the Delta table you want to update is named "myDeltaTable". The merge operation will match rows in the CDC data to rows in the Delta table based on the "key" column. If a row in the CDC data has "deleted" set to true, the corresponding row in the Delta table will be deleted. If a row in the CDC data matches a row in the Delta table, the values in the Delta table will be updated to match the values in the CDC data. If a row in the CDC data does not match any rows in the Delta table, a new row will be inserted into the Delta table.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.