<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to Design a Data Quality Framework for Medallion Architecture Data Pipeline in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-design-a-data-quality-framework-for-medallion/m-p/139550#M51226</link>
    <description>&lt;P&gt;Very Broad Topic. Let me try to break it and provide few key-points.&lt;/P&gt;&lt;P&gt;The most practical design involves defining &lt;STRONG&gt;Data Quality Expectations&lt;/STRONG&gt; (rules) in DLT for each layer and implementing an automated process to validate the data against those rules.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Bronze: &lt;/STRONG&gt;Focus on Completeness and Availability&lt;/P&gt;&lt;P&gt;The Bronze layer is your raw, immutable landing zone. The goal is to &lt;STRONG&gt;capture everything&lt;/STRONG&gt; and avoid dropping data. Data Quality checks here are minimal and focus on the integrity of the ingestion process itself.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Silver&lt;/STRONG&gt;: Focus on Validity, Consistency, and Uniqueness&lt;/P&gt;&lt;P&gt;The Silver layer is where raw data is cleaned, validated, conformed, and enriched. This is the most crucial stage for implementing business-specific quality rules.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Gold&lt;/STRONG&gt;: Focus on Accuracy and Business Logic&lt;/P&gt;&lt;P&gt;The Gold layer is for final, aggregated, and curated business-ready data. Checks here confirm that the final transformation and aggregation logic is correct.&lt;/P&gt;&lt;P&gt;Reference Link for DLT/LDP -&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/ldp/expectations" target="_blank" rel="noopener"&gt;https://docs.databricks.com/aws/en/ldp/expectations&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 18 Nov 2025 16:25:58 GMT</pubDate>
    <dc:creator>Raman_Unifeye</dc:creator>
    <dc:date>2025-11-18T16:25:58Z</dc:date>
    <item>
      <title>How to Design a Data Quality Framework for Medallion Architecture Data Pipeline</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-design-a-data-quality-framework-for-medallion/m-p/139548#M51225</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I am building a Data Pipeline which extract data from Oracle Fusion and Push it to Databricks Delta lake.&lt;/P&gt;&lt;P&gt;I am using Bronze, Silver and Gold Approach.&lt;/P&gt;&lt;P&gt;May someone please help me how to control all three segment that is Bronze, Silver and Gold with Data Quality Framework.&lt;/P&gt;&lt;P&gt;So How to design Data Quality Framework for&amp;nbsp;Medallion architecture in practical.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks a lot for help.&lt;/P&gt;</description>
      <pubDate>Tue, 18 Nov 2025 15:57:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-design-a-data-quality-framework-for-medallion/m-p/139548#M51225</guid>
      <dc:creator>Pratikmsbsvm</dc:creator>
      <dc:date>2025-11-18T15:57:52Z</dc:date>
    </item>
    <item>
      <title>Re: How to Design a Data Quality Framework for Medallion Architecture Data Pipeline</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-design-a-data-quality-framework-for-medallion/m-p/139550#M51226</link>
      <description>&lt;P&gt;Very Broad Topic. Let me try to break it and provide few key-points.&lt;/P&gt;&lt;P&gt;The most practical design involves defining &lt;STRONG&gt;Data Quality Expectations&lt;/STRONG&gt; (rules) in DLT for each layer and implementing an automated process to validate the data against those rules.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Bronze: &lt;/STRONG&gt;Focus on Completeness and Availability&lt;/P&gt;&lt;P&gt;The Bronze layer is your raw, immutable landing zone. The goal is to &lt;STRONG&gt;capture everything&lt;/STRONG&gt; and avoid dropping data. Data Quality checks here are minimal and focus on the integrity of the ingestion process itself.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Silver&lt;/STRONG&gt;: Focus on Validity, Consistency, and Uniqueness&lt;/P&gt;&lt;P&gt;The Silver layer is where raw data is cleaned, validated, conformed, and enriched. This is the most crucial stage for implementing business-specific quality rules.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Gold&lt;/STRONG&gt;: Focus on Accuracy and Business Logic&lt;/P&gt;&lt;P&gt;The Gold layer is for final, aggregated, and curated business-ready data. Checks here confirm that the final transformation and aggregation logic is correct.&lt;/P&gt;&lt;P&gt;Reference Link for DLT/LDP -&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/ldp/expectations" target="_blank" rel="noopener"&gt;https://docs.databricks.com/aws/en/ldp/expectations&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 18 Nov 2025 16:25:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-design-a-data-quality-framework-for-medallion/m-p/139550#M51226</guid>
      <dc:creator>Raman_Unifeye</dc:creator>
      <dc:date>2025-11-18T16:25:58Z</dc:date>
    </item>
    <item>
      <title>Re: How to Design a Data Quality Framework for Medallion Architecture Data Pipeline</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-design-a-data-quality-framework-for-medallion/m-p/139559#M51228</link>
      <description>&lt;P&gt;Here’s how you can implement DQ at each stage:&lt;/P&gt;&lt;H4&gt;&lt;STRONG&gt;Bronze Layer&lt;/STRONG&gt;&lt;/H4&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Checks&lt;/STRONG&gt;:&lt;UL&gt;&lt;LI&gt;File format validation (CSV, JSON, etc.).&lt;/LI&gt;&lt;LI&gt;Schema validation (column names, types).&lt;/LI&gt;&lt;LI&gt;Row count vs. source system.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Tools&lt;/STRONG&gt;:&lt;UL&gt;&lt;LI&gt;Use &lt;STRONG&gt;Databricks Autoloader&lt;/STRONG&gt; with schema evolution and&amp;nbsp;&lt;!--  StartFragment   --&gt;&lt;SPAN class=""&gt;badRecordsPath&lt;/SPAN&gt;&lt;!--  EndFragment   --&gt;&lt;/LI&gt;&lt;LI&gt;Implement &lt;STRONG&gt;Great Expectations&lt;/STRONG&gt; or &lt;STRONG&gt;Deequ&lt;/STRONG&gt; for basic validations.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;H4&gt;&lt;STRONG&gt;Silver Layer&lt;/STRONG&gt;&lt;/H4&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Checks&lt;/STRONG&gt;:&lt;UL&gt;&lt;LI&gt;Remove duplicates.&lt;/LI&gt;&lt;LI&gt;Validate referential integrity (foreign keys).&lt;/LI&gt;&lt;LI&gt;Standardize data types and formats.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Tools&lt;/STRONG&gt;:&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Delta Live Tables (DLT)&lt;/STRONG&gt; with expectations.&lt;/LI&gt;&lt;LI&gt;Great Expectations for advanced profiling.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Automation&lt;/STRONG&gt;:&lt;UL&gt;&lt;LI&gt;Define expectations in DLT pipelines (expectations block).&lt;/LI&gt;&lt;LI&gt;Fail or quarantine bad records.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;H4&gt;&lt;STRONG&gt;Gold Layer&lt;/STRONG&gt;&lt;/H4&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Checks&lt;/STRONG&gt;:&lt;UL&gt;&lt;LI&gt;Business rule validation (e.g., revenue &amp;gt; 0).&lt;/LI&gt;&lt;LI&gt;KPI consistency checks.&lt;/LI&gt;&lt;LI&gt;Aggregation accuracy.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Tools&lt;/STRONG&gt;:&lt;UL&gt;&lt;LI&gt;DLT expectations or custom Spark jobs.&lt;/LI&gt;&lt;LI&gt;Integrate with &lt;STRONG&gt;Unity Catalog&lt;/STRONG&gt; for governance and lineage.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;H3&gt;&lt;STRONG&gt;Practical Tools&lt;/STRONG&gt;&lt;/H3&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Great Expectations&lt;/STRONG&gt;: Flexible, open-source, integrates with Databricks.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Delta Live Tables&lt;/STRONG&gt;: Native expectations for Bronze/Silver/Gold.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;AWS Deequ&lt;/STRONG&gt;: For statistical checks.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Unity Catalog&lt;/STRONG&gt;: Governance, lineage, and access control.&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Tue, 18 Nov 2025 17:24:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-design-a-data-quality-framework-for-medallion/m-p/139559#M51228</guid>
      <dc:creator>nayan_wylde</dc:creator>
      <dc:date>2025-11-18T17:24:24Z</dc:date>
    </item>
  </channel>
</rss>

