Databricks Community

Fredolebeau80 · ‎06-29-2023

How refresh delta table with New raw from CDC Json file.

Chevron · ‎06-29-2023

Read the CDC JSON file containing new raw data, make all the necessary transformations and load into an staging table in delta format and apply the changes to the target delta tables using appropriate merge operation or method from stagin

Vinay_M_R · ‎06-30-2023

To refresh a delta table with new raw data from a CDC JSON file, you can use change data capture (CDC) to update tables based on changes in source data. Here are the steps:1. Create a streaming table using the CREATE OR REFRESH STREAMING TABLE statement in SQL or the create_streaming_table() function in Python.
2. Use an APPLY CHANGES INTO statement to specify the source, keys, and sequencing for the change feed.
3. Use the APPLY CHANGES statement in SQL or the apply_changes() function in Python to create the statement defining the CDC processing.
4. Once you have the CDC data in a DataFrame, use the MERGE INTO statement to merge the data from the CDC table into the original Delta table.
Here is an example code snippet in Scala:
---------------------------------------------------------------------
%scala
val deltaTable = DeltaTable.forName("myDeltaTable")
val cdcDF = spark.read.json("path/to/cdc.json")
deltaTable.as("t")
.merge(cdcDF.as("s"), "s.key = t.key")
.whenMatched("s.deleted = true")
.delete()
.whenMatched()
.updateAll()
.whenNotMatched("s.deleted = false")
.insertAll()
.execute()
----------------------------------------------------------------------------
This code assumes that the CDC data is in a JSON file located at "path/to/cdc.json". It also assumes that the Delta table you want to update is named "myDeltaTable". The merge operation will match rows in the CDC data to rows in the Delta table based on the "key" column. If a row in the CDC data has "deleted" set to true, the corresponding row in the Delta table will be deleted. If a row in the CDC data matches a row in the Delta table, the values in the Delta table will be updated to match the values in the CDC data. If a row in the CDC data does not match any rows in the Delta table, a new row will be inserted into the Delta table.

Databricks Community

Refresh delta

Connect with Databricks Users in Your Area

Submit your feedback and win a $50 gift card!

Share Your Feedback in Our Community Survey

Databricks Named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud Database Management Systems

Announcing the new Meta Llama 3.3 model on Databricks

Milestone: DatabricksTV Reaches 100 Videos!