cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Refresh delta

Fredolebeau80
New Contributor II

How refresh delta table with New raw from CDC Json file. 

2 REPLIES 2

Chevron
New Contributor II

Read the CDC JSON file containing new raw data, make all the necessary transformations and  load into an staging table in delta format and apply the changes to the target delta tables using appropriate merge operation or method from stagin

Vinay_M_R
Databricks Employee
Databricks Employee

To refresh a delta table with new raw data from a CDC JSON file, you can use change data capture (CDC) to update tables based on changes in source data. Here are the steps:1. Create a streaming table using the CREATE OR REFRESH STREAMING TABLE statement in SQL or the create_streaming_table() function in Python.
2. Use an APPLY CHANGES INTO statement to specify the source, keys, and sequencing for the change feed.
3. Use the APPLY CHANGES statement in SQL or the apply_changes() function in Python to create the statement defining the CDC processing.
4. Once you have the CDC data in a DataFrame, use the MERGE INTO statement to merge the data from the CDC table into the original Delta table.
Here is an example code snippet in Scala:
---------------------------------------------------------------------
%scala
val deltaTable = DeltaTable.forName("myDeltaTable")
val cdcDF = spark.read.json("path/to/cdc.json")
deltaTable.as("t")
.merge(cdcDF.as("s"), "s.key = t.key")
.whenMatched("s.deleted = true")
.delete()
.whenMatched()
.updateAll()
.whenNotMatched("s.deleted = false")
.insertAll()
.execute()
----------------------------------------------------------------------------
This code assumes that the CDC data is in a JSON file located at "path/to/cdc.json". It also assumes that the Delta table you want to update is named "myDeltaTable". The merge operation will match rows in the CDC data to rows in the Delta table based on the "key" column. If a row in the CDC data has "deleted" set to true, the corresponding row in the Delta table will be deleted. If a row in the CDC data matches a row in the Delta table, the values in the Delta table will be updated to match the values in the CDC data. If a row in the CDC data does not match any rows in the Delta table, a new row will be inserted into the Delta table.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group