Hi @garciargs ,
Yes, in databricks you can do it using DLT (Delta Live Table) and Spark Structured Streaming, where you have to enable CDF (Change Data Feed) on both contracts_raw and customer_raw which would track all DML changes over raw tables.
-- New Delta table with CDF enabled
CREATE TABLE myDeltaTable (
id INT,
name STRING,
age INT
)
TBLPROPERTIES (delta.enableChangeDataFeed = true);
-- Enable CDF on existing table
ALTER TABLE myDeltaTable
SET TBLPROPERTIES (delta.enableChangeDataFeed = true);
In a DLT notebook, you can read from both tables during data operations such as append, update, and delete, and then update your silver table accordingly. The following code is a rough example of how you can achieve this.
import dlt
from pyspark.sql.functions import col
# Enable CDF on all new tables by default
spark.sql("SET spark.databricks.delta.properties.defaults.enableChangeDataFeed = true")
@dlt.table(quality='bronze')
def customer_raw():
return (spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "csv") # Change to your file format
.load("s3a://<BUCKET_NAME>/<FILE_PATH>/customer")) # Change to your cloud storage path
@dlt.table(quality='bronze')
def contracts_raw():
return (spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "csv") # Change to your file format
.load("s3a://<BUCKET_NAME>/<FILE_PATH>/contracts")) # Change to your cloud storage path
@dlt.table(quality='silver')
def contracts_silver():
customer_df = (spark.readStream
.option("readChangeFeed", "true")
.table("customer_raw"))
contracts_df = (spark.readStream
.option("readChangeFeed", "true")
.table("contracts_raw"))
joined_df = customer_df.join(contracts_df, customer_df["customer_id"] == contracts_df["customer_id"], "inner")
# Note: You can perform a merge statement for each batch of data
return joined_df.select(customer_df["*"], contracts_df["contract_details"])
Refer following link to how apply changes works in DLT The APPLY CHANGES APIs: Simplify change data capture with Delta Live Tables | Databricks on AWS
Regards,
Hari Prasad
Regards,
Hari Prasad