Using DLT for Your Use Case
DLT can be a good fit for your scenario, especially when implementing Slowly Changing Dimension (SCD) Type 2. Here's how you can approach this:
- Ingestion with Auto Loader: Use Auto Loader to ingest the daily parquet files into your bronze layer. This handles the full overwrites efficiently.
- Bronze Layer Processing: Create a bronze table using DLT that reads from the landing area.
- SCD Type 2 Implementation: Implement SCD Type 2 in the silver layer using DLT's
APPLY CHANGES
syntax.
Implementation Approach
Here's a high-level implementation strategy:
Bronze Layer:
@Dlt.table
def bronze_table():
return (
spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "parquet")
.load("/path/to/landing/area")
)
Silver Layer with SCD Type 2:
dlt.create_streaming_table("silver_table_scd2")
dlt.apply_changes(
target = "silver_table_scd2",
source = "bronze_table",
keys = ["your_primary_key"],
sequence_by = col("file_modification_time"),
stored_as_scd_type = "2"
)