<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: What is the Data Quality Framework do you use/recomend ? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/what-is-the-data-quality-framework-do-you-use-recomend/m-p/122089#M46648</link>
    <description>&lt;P&gt;There are many DQ tools and platforms, but most are SQL based, and thus it costs and its delayed.&amp;nbsp; so it really depends on your use-case and problem statement. sometimes it makes sense to build your own, but most of the time it does not make sense if it should be used as central service.&lt;/P&gt;</description>
    <pubDate>Wed, 18 Jun 2025 07:19:15 GMT</pubDate>
    <dc:creator>dataoculus_app</dc:creator>
    <dc:date>2025-06-18T07:19:15Z</dc:date>
    <item>
      <title>What is the Data Quality Framework do you use/recomend ?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-data-quality-framework-do-you-use-recomend/m-p/54025#M29965</link>
      <description>&lt;P&gt;Hi guys,&lt;/P&gt;&lt;P&gt;In your opinion what is the best Data Quality Framework (or techinique) do you recommend ?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 27 Nov 2023 23:47:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-data-quality-framework-do-you-use-recomend/m-p/54025#M29965</guid>
      <dc:creator>William_Scardua</dc:creator>
      <dc:date>2023-11-27T23:47:52Z</dc:date>
    </item>
    <item>
      <title>Re: What is the Data Quality Framework do you use/recomend ?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-data-quality-framework-do-you-use-recomend/m-p/80592#M36081</link>
      <description>&lt;P&gt;Hi there!&lt;/P&gt;&lt;P&gt;You could also take a look at &lt;A href="https://rudol.ai" target="_self"&gt;Rudol&lt;/A&gt;, it has native Databricks support and covers Data Quality validations and Data Governance enabling non-technical roles such as Business Analysts or Data Stewards to be part of data quality as well with no-code validations and integrations with everyday tools like Slack or Microsoft Teams.&lt;/P&gt;&lt;P&gt;Have a high-quality day!&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 25 Jul 2024 15:39:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-data-quality-framework-do-you-use-recomend/m-p/80592#M36081</guid>
      <dc:creator>joarobles</dc:creator>
      <dc:date>2024-07-25T15:39:07Z</dc:date>
    </item>
    <item>
      <title>Re: What is the Data Quality Framework do you use/recomend ?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-data-quality-framework-do-you-use-recomend/m-p/122089#M46648</link>
      <description>&lt;P&gt;There are many DQ tools and platforms, but most are SQL based, and thus it costs and its delayed.&amp;nbsp; so it really depends on your use-case and problem statement. sometimes it makes sense to build your own, but most of the time it does not make sense if it should be used as central service.&lt;/P&gt;</description>
      <pubDate>Wed, 18 Jun 2025 07:19:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-data-quality-framework-do-you-use-recomend/m-p/122089#M46648</guid>
      <dc:creator>dataoculus_app</dc:creator>
      <dc:date>2025-06-18T07:19:15Z</dc:date>
    </item>
    <item>
      <title>Re: What is the Data Quality Framework do you use/recomend ?</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-data-quality-framework-do-you-use-recomend/m-p/122101#M46652</link>
      <description>&lt;P&gt;DQ is interesting. There are a lot of options in this space. SODA, Great Expectations are kinda well integrate with Databricks setup.&lt;/P&gt;&lt;P&gt;I personally try to use dataframe abstractions for validating. We used deequ tool which is very simple to use, just pass your spark dataframe to the code, and validations happen inside your spark session (if it needs to be), otherwise we can decouple the DQ to separate classes in the package. I have spent some time working with it and created this blog post -&amp;nbsp;&lt;A href="https://datatribe.substack.com/p/deequ-an-open-source-data-quality" target="_blank"&gt;https://datatribe.substack.com/p/deequ-an-open-source-data-quality&lt;/A&gt;&amp;nbsp;&lt;BR /&gt;Its a DQ tool for data engineers I would say. And, interestingly, we can make this deequ dataframes as output delta tables to see the quality patterns. Maintainer is AWSLABS.&amp;nbsp;&lt;A href="https://github.com/awslabs/deequ" target="_blank"&gt;https://github.com/awslabs/deequ&lt;/A&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;In addition, I would like to use spark-expectations opensourced by Nike -&amp;nbsp;&lt;A href="https://github.com/Nike-Inc/spark-expectations" target="_blank"&gt;https://github.com/Nike-Inc/spark-expectations&lt;/A&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 18 Jun 2025 09:19:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-data-quality-framework-do-you-use-recomend/m-p/122101#M46652</guid>
      <dc:creator>chanukya-pekala</dc:creator>
      <dc:date>2025-06-18T09:19:01Z</dc:date>
    </item>
  </channel>
</rss>

