11-15-2024 02:58 AM
Can anyone provide a sample MERGE INTO SQL query for implementing SCD Type 2 in Databricks using Delta Tables?
11-15-2024 03:42 AM
3 weeks ago
Is there any limitation to the length of the string passed to md5 function when concatenating multiple columns to generate hash_value field ?
11-15-2024 03:57 AM
also, in PySpark, the same example in pyspark:
from pyspark.sql.functions import col, concat_ws, current_date, lit, md5
source_df = spark.table("source_table")
target_df = spark.table("target_table")
source_with_hash_df = source_df.withColumn("hash_value", md5(concat_ws("|", col("name"), col("address"), col("email"), col("phone"))))
target_df.alias("target").merge(
source_with_hash_df.alias("source"),
"target.customer_id = source.customer_id AND target.is_current = true"
).whenMatchedUpdate(
condition="target.hash_value != source.hash_value",
set={
"valid_to": current_date() - 1,
"is_current": lit(False)
}
).whenNotMatchedInsert(
values={
"customer_id": col("source.customer_id"),
"name": col("source.name"),
"address": col("source.address"),
"email": col("source.email"),
"phone": col("source.phone"),
"valid_from": current_date(),
"valid_to": lit("9999-12-31"),
"is_current": lit(True),
"hash_value": col("source.hash_value")
}
).whenNotMatchedBySourceUpdate(
condition="target.is_current = true",
set={
"valid_to": current_date() - 1,
"is_current": lit(False)
}
)
You have to add an action to execute.
3 weeks ago
Hi @Akshay_Petkar , please refer this code ,
3 weeks ago
@JissMathew and @David_Torrejon , Thanks for sharing the example
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group