<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Scaling SCD on Databricks: Then vs Now in Community Articles</title>
    <link>https://community.databricks.com/t5/community-articles/scaling-scd-on-databricks-then-vs-now/m-p/148912#M1021</link>
    <description>&lt;P&gt;Between 2019 and 2021, we built a large-scale lakehouse on Databricks supporting multi-market payments processing (7B+ transactions/year).&lt;/P&gt;&lt;P&gt;If ingestion was complex (covered in Part 1), the Silver layer was even more interesting.&lt;/P&gt;&lt;P&gt;Implementing &lt;STRONG&gt;SCD Type 1&lt;/STRONG&gt; at scale using early versions of Delta Lake required significantly more engineering than many people remember.&lt;/P&gt;&lt;P&gt;Even though &lt;SPAN class=""&gt;&lt;SPAN class=""&gt;Delta Lake&lt;/SPAN&gt;&lt;/SPAN&gt; introduced ACID guarantees and MERGE support, production-grade SCD pipelines still required custom handling for:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Deduplication of CDC events&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Out-of-order updates&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Explicit column mapping in MERGE statements&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Schema evolution workarounds&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Multiple-match conflicts in micro-batches&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;To make this reliable, we built a fully parameterized Scala framework that:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Applied window-based deduplication&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Forced schema evolution via controlled writes&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Dynamically generated MERGE statements&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Standardized SCD logic across datasets&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;It worked — but it was heavy.&lt;/P&gt;&lt;P&gt;Fast forward to today, and much of that custom framework logic can be replaced by &lt;SPAN class=""&gt;&lt;SPAN class=""&gt;Lakeflow Declarative Pipelines&lt;/SPAN&gt;&lt;/SPAN&gt;, specifically the AUTO CDC capability.&lt;/P&gt;&lt;P&gt;AUTO CDC abstracts:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Deduplication and sequencing&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Out-of-order handling&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;SCD Type 1 and Type 2 logic&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Delete semantics&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Streaming operational complexity&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;What once required hundreds of lines of Spark framework code can now be expressed declaratively.&lt;/P&gt;&lt;P&gt;That’s a major architectural shift.&lt;/P&gt;&lt;P&gt;I wrote a detailed breakdown of:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;The original SCD framework pattern&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;The specific Delta Lake limitations we had to work around&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;How AUTO CDC changes the Silver-layer design&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;What to validate before adopting it in production&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;span class="lia-unicode-emoji" title=":link:"&gt;🔗&lt;/span&gt; Full article here:&amp;nbsp;&lt;A href="https://medium.com/@wesley.felipe/databricks-lakehouse-without-the-workarounds-part-2-scd-840d9748920d" target="_blank" rel="noopener"&gt;https://medium.com/@wesley.felipe/databricks-lakehouse-without-the-workarounds-part-2-scd-840d9748920d&lt;/A&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 20 Feb 2026 16:07:26 GMT</pubDate>
    <dc:creator>wesleyfelipe</dc:creator>
    <dc:date>2026-02-20T16:07:26Z</dc:date>
    <item>
      <title>Scaling SCD on Databricks: Then vs Now</title>
      <link>https://community.databricks.com/t5/community-articles/scaling-scd-on-databricks-then-vs-now/m-p/148912#M1021</link>
      <description>&lt;P&gt;Between 2019 and 2021, we built a large-scale lakehouse on Databricks supporting multi-market payments processing (7B+ transactions/year).&lt;/P&gt;&lt;P&gt;If ingestion was complex (covered in Part 1), the Silver layer was even more interesting.&lt;/P&gt;&lt;P&gt;Implementing &lt;STRONG&gt;SCD Type 1&lt;/STRONG&gt; at scale using early versions of Delta Lake required significantly more engineering than many people remember.&lt;/P&gt;&lt;P&gt;Even though &lt;SPAN class=""&gt;&lt;SPAN class=""&gt;Delta Lake&lt;/SPAN&gt;&lt;/SPAN&gt; introduced ACID guarantees and MERGE support, production-grade SCD pipelines still required custom handling for:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Deduplication of CDC events&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Out-of-order updates&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Explicit column mapping in MERGE statements&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Schema evolution workarounds&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Multiple-match conflicts in micro-batches&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;To make this reliable, we built a fully parameterized Scala framework that:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Applied window-based deduplication&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Forced schema evolution via controlled writes&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Dynamically generated MERGE statements&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Standardized SCD logic across datasets&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;It worked — but it was heavy.&lt;/P&gt;&lt;P&gt;Fast forward to today, and much of that custom framework logic can be replaced by &lt;SPAN class=""&gt;&lt;SPAN class=""&gt;Lakeflow Declarative Pipelines&lt;/SPAN&gt;&lt;/SPAN&gt;, specifically the AUTO CDC capability.&lt;/P&gt;&lt;P&gt;AUTO CDC abstracts:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Deduplication and sequencing&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Out-of-order handling&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;SCD Type 1 and Type 2 logic&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Delete semantics&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Streaming operational complexity&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;What once required hundreds of lines of Spark framework code can now be expressed declaratively.&lt;/P&gt;&lt;P&gt;That’s a major architectural shift.&lt;/P&gt;&lt;P&gt;I wrote a detailed breakdown of:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;The original SCD framework pattern&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;The specific Delta Lake limitations we had to work around&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;How AUTO CDC changes the Silver-layer design&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;What to validate before adopting it in production&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;span class="lia-unicode-emoji" title=":link:"&gt;🔗&lt;/span&gt; Full article here:&amp;nbsp;&lt;A href="https://medium.com/@wesley.felipe/databricks-lakehouse-without-the-workarounds-part-2-scd-840d9748920d" target="_blank" rel="noopener"&gt;https://medium.com/@wesley.felipe/databricks-lakehouse-without-the-workarounds-part-2-scd-840d9748920d&lt;/A&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 20 Feb 2026 16:07:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/scaling-scd-on-databricks-then-vs-now/m-p/148912#M1021</guid>
      <dc:creator>wesleyfelipe</dc:creator>
      <dc:date>2026-02-20T16:07:26Z</dc:date>
    </item>
  </channel>
</rss>

