<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Exploring Data Quality Frameworks in Databricks in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/exploring-data-quality-frameworks-in-databricks/m-p/98697#M39806</link>
    <description>&lt;P&gt;I’m currently investigating solutions for Data Quality (DQ) within the Databricks environment and would love to hear what frameworks or approaches you are using for this purpose.&lt;/P&gt;&lt;P&gt;In the past, I’ve worked with &lt;STRONG&gt;Deequ&lt;/STRONG&gt;, but I’ve noticed that it’s not as widely used anymore, and I’ve heard great expectations around other solutions. I’m curious to learn about your experiences:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;STRONG&gt;What frameworks or tools are you using for Data Quality in Databricks today?&lt;/STRONG&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;How do you approach DQ monitoring, validation, and automation in your pipelines?&lt;/STRONG&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Are there any specific challenges or best practices you'd like to share?&lt;/STRONG&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Any insights or recommendations would be greatly appreciated. Looking forward to hearing your thoughts!&lt;/P&gt;</description>
    <pubDate>Wed, 13 Nov 2024 16:40:56 GMT</pubDate>
    <dc:creator>jommo</dc:creator>
    <dc:date>2024-11-13T16:40:56Z</dc:date>
    <item>
      <title>Exploring Data Quality Frameworks in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/exploring-data-quality-frameworks-in-databricks/m-p/98697#M39806</link>
      <description>&lt;P&gt;I’m currently investigating solutions for Data Quality (DQ) within the Databricks environment and would love to hear what frameworks or approaches you are using for this purpose.&lt;/P&gt;&lt;P&gt;In the past, I’ve worked with &lt;STRONG&gt;Deequ&lt;/STRONG&gt;, but I’ve noticed that it’s not as widely used anymore, and I’ve heard great expectations around other solutions. I’m curious to learn about your experiences:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;STRONG&gt;What frameworks or tools are you using for Data Quality in Databricks today?&lt;/STRONG&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;How do you approach DQ monitoring, validation, and automation in your pipelines?&lt;/STRONG&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Are there any specific challenges or best practices you'd like to share?&lt;/STRONG&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Any insights or recommendations would be greatly appreciated. Looking forward to hearing your thoughts!&lt;/P&gt;</description>
      <pubDate>Wed, 13 Nov 2024 16:40:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/exploring-data-quality-frameworks-in-databricks/m-p/98697#M39806</guid>
      <dc:creator>jommo</dc:creator>
      <dc:date>2024-11-13T16:40:56Z</dc:date>
    </item>
    <item>
      <title>Re: Exploring Data Quality Frameworks in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/exploring-data-quality-frameworks-in-databricks/m-p/98737#M39825</link>
      <description>&lt;P&gt;Delta Live Tables (DLT): ref:&amp;nbsp;&lt;A href="https://docs.databricks.com/en/delta-live-tables/expectations.html" target="_blank"&gt;https://docs.databricks.com/en/delta-live-tables/expectations.html&lt;/A&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Expectations: DLT allows you to define data quality constraints on datasets using expectations. These expectations can be applied to queries using Python decorators or SQL constraint clauses. Actions for invalid records include warning, dropping, or quarantining them.&lt;/LI&gt;
&lt;LI&gt;Advanced Validation: You can perform complex data quality checks by defining materialized views using aggregate and join queries.&lt;/LI&gt;
&lt;LI&gt;Portability and Reusability: Data quality rules can be maintained separately from pipeline implementations, stored in a Delta table, and applied using tags.&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Thu, 14 Nov 2024 06:22:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/exploring-data-quality-frameworks-in-databricks/m-p/98737#M39825</guid>
      <dc:creator>SparkJun</dc:creator>
      <dc:date>2024-11-14T06:22:30Z</dc:date>
    </item>
    <item>
      <title>Re: Exploring Data Quality Frameworks in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/exploring-data-quality-frameworks-in-databricks/m-p/122082#M46644</link>
      <description>&lt;P&gt;GE and other DQ tools will fire lot of SQLs, increasing cost and adding delays. so it depends on whats your requirements are. happy to discuss more if you are interested, as I am also going to make such tool available to databricks community as well over the marketplace, as its really effortful for everyone to do any DQ checks.&lt;/P&gt;</description>
      <pubDate>Wed, 18 Jun 2025 06:40:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/exploring-data-quality-frameworks-in-databricks/m-p/122082#M46644</guid>
      <dc:creator>dataoculus_app</dc:creator>
      <dc:date>2025-06-18T06:40:02Z</dc:date>
    </item>
  </channel>
</rss>

