Databricks Community

Anubhav2603 · ‎08-20-2025

I am working on DLT pipeline and have one question. As explained on this page (https://docs.databricks.com/aws/en/dlt/tutorial-pipelines?language=Python), we will end up creating 3 tables i.e. customer_cdc_bronze, customer_cdc_clean and customer. This means that in my storage account, I'll end up with 3 copies of the data. Now my tables have billions of records. Is there a way that we can create streaming table in bronze with cdc and then in silver table I can have clean data with enforced schema and also apply the cdc logic?

nayan_wylde · ‎08-20-2025

With this optimized approach, I would suggest creating view for Clean table:
1. Bronze Table: Raw CDC data (full storage)
2. Clean View: No physical storage - computed on-demand
3. Silver Table: Final processed data with SCD2 history
Result: ~67% storage reduction compared to 3 full table copies!

Sample code:

import dlt
from pyspark.sql.functions import col, expr

# Bronze: Raw streaming data
@Dlt.table
def customer_cdc_bronze():
return spark.readStream.format("cloudFiles") \
.option("cloudFiles.format", "json") \
.load("/path/to/cdc/files/")

# Silver: Clean view (no storage)
@Dlt.view
def customer_cdc_clean():
return dlt.read_stream("customer_cdc_bronze") \
.filter(col("customer_id").isNotNull()) \
.select(
col("customer_id"),
col("customer_name").alias("name"),
col("sequence_number")
)

# Final silver table with CDC
dlt.create_streaming_table("customer_silver")

dlt.apply_changes(
target="customer_silver",
source="customer_cdc_clean", # Using view
keys=["customer_id"],
sequence_by=col("sequence_number"),
stored_as_scd_type="2"
)

Databricks Community

DLT Pipeline

Join Us as a Local Community Builder!

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples