<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Delta Live Tables data quality rules application. in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-data-quality-rules-application/m-p/10874#M5928</link>
    <description>&lt;P&gt;Hi @Swapnil Kamle​,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.&lt;/P&gt;</description>
    <pubDate>Fri, 24 Feb 2023 23:28:04 GMT</pubDate>
    <dc:creator>jose_gonzalez</dc:creator>
    <dc:date>2023-02-24T23:28:04Z</dc:date>
    <item>
      <title>Delta Live Tables data quality rules application.</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-data-quality-rules-application/m-p/10869#M5923</link>
      <description>&lt;P&gt;I have a requirement, where I need to apply inverse DQ rule on a table to track the invalid data. For which I can use the following approach:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;import dlt&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;rules = {}&lt;/P&gt;&lt;P&gt;quarantine_rules = {}&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;rules["valid_website"] = "(Website IS NOT NULL)"&lt;/P&gt;&lt;P&gt;rules["valid_location"] = "(Location IS NOT NULL)"&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;# concatenate inverse rules&lt;/P&gt;&lt;P&gt;quarantine_rules["invalid_record"] = "NOT({0})".format(" AND ".join(rules.values()))&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;@dlt.table(&lt;/P&gt;&lt;P&gt;&amp;nbsp;name="raw_farmers_market"&lt;/P&gt;&lt;P&gt;)&lt;/P&gt;&lt;P&gt;def get_farmers_market_data():&lt;/P&gt;&lt;P&gt;&amp;nbsp;return (&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;spark.read.format('csv').option("header", "true")&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;.load('/databricks-datasets/data.gov/farmers_markets_geographic_data/data-001/')&lt;/P&gt;&lt;P&gt;&amp;nbsp;)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;@dlt.table(&lt;/P&gt;&lt;P&gt;&amp;nbsp;name="valid_farmers_market"&lt;/P&gt;&lt;P&gt;)&lt;/P&gt;&lt;P&gt;@dlt.expect_all_or_drop(rules)&lt;/P&gt;&lt;P&gt;def get_valid_farmers_market():&lt;/P&gt;&lt;P&gt;&amp;nbsp;return (&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;dlt.read("raw_farmers_market")&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;.select("MarketName", "Website", "Location", "State",&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"Facebook", "Twitter", "Youtube", "Organic", "updateTime")&lt;/P&gt;&lt;P&gt;&amp;nbsp;)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;@dlt.table(&lt;/P&gt;&lt;P&gt;&amp;nbsp;name="invalid_farmers_market"&lt;/P&gt;&lt;P&gt;)&lt;/P&gt;&lt;P&gt;@dlt.expect_all_or_drop(quarantine_rules)&lt;/P&gt;&lt;P&gt;def get_invalid_farmers_market():&lt;/P&gt;&lt;P&gt;&amp;nbsp;return (&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;dlt.read("raw_farmers_market")&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;.select("MarketName", "Website", "Location", "State",&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"Facebook", "Twitter", "Youtube", "Organic", "updateTime")&lt;/P&gt;&lt;P&gt;&amp;nbsp;)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;However, when I store the invalid data in another table i.e., &lt;B&gt;invalid_farmers_market&lt;/B&gt;. It will add all the rows which is invalid, but I am trying to apply following 2 rules.&lt;/P&gt;&lt;P&gt;&lt;B&gt;rules["valid_website"] = "(Website IS NOT NULL)"&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;rules["valid_location"] = "(Location IS NOT NULL)"  &lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I want to know is there is any way how I can understand the specific row is in invalid table because of which specific rule. Either by &lt;B&gt;rules["valid_website"] or rules["valid_location"] or&lt;/B&gt; &lt;B&gt;both&lt;/B&gt;. So that I can take appropriate action for the specific column.&lt;/P&gt;</description>
      <pubDate>Mon, 23 Jan 2023 11:20:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-tables-data-quality-rules-application/m-p/10869#M5923</guid>
      <dc:creator>SRK</dc:creator>
      <dc:date>2023-01-23T11:20:31Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Live Tables data quality rules application.</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-data-quality-rules-application/m-p/10870#M5924</link>
      <description>&lt;P&gt;@Swapnil Kamle​&amp;nbsp;&lt;/P&gt;&lt;P&gt;I don't think there's a way of doing that out of the box.&lt;/P&gt;&lt;P&gt;IMO the best way would be to create a new Boolean columns (valid_website and valid_locaton) or create a view on top of the table that will have true/false indicator.&lt;/P&gt;</description>
      <pubDate>Mon, 23 Jan 2023 12:42:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-tables-data-quality-rules-application/m-p/10870#M5924</guid>
      <dc:creator>daniel_sahal</dc:creator>
      <dc:date>2023-01-23T12:42:29Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Live Tables data quality rules application.</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-data-quality-rules-application/m-p/10871#M5925</link>
      <description>&lt;P&gt;You can get additional info from DLT event log which is in delta so you can load it as table &lt;A href="https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-event-log.html#data-quality" target="test_blank"&gt;https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-event-log.html#data-quality&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 23 Jan 2023 13:06:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-tables-data-quality-rules-application/m-p/10871#M5925</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2023-01-23T13:06:33Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Live Tables data quality rules application.</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-data-quality-rules-application/m-p/10872#M5926</link>
      <description>&lt;P&gt;Thanks for the reply. I will check if that helps. ​&lt;/P&gt;</description>
      <pubDate>Mon, 23 Jan 2023 17:26:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-tables-data-quality-rules-application/m-p/10872#M5926</guid>
      <dc:creator>SRK</dc:creator>
      <dc:date>2023-01-23T17:26:09Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Live Tables data quality rules application.</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-data-quality-rules-application/m-p/10873#M5927</link>
      <description>&lt;P&gt;Thanks for the reply. I will check, how this helps in my case. ​&lt;/P&gt;</description>
      <pubDate>Mon, 23 Jan 2023 17:27:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-tables-data-quality-rules-application/m-p/10873#M5927</guid>
      <dc:creator>SRK</dc:creator>
      <dc:date>2023-01-23T17:27:07Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Live Tables data quality rules application.</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-data-quality-rules-application/m-p/10874#M5928</link>
      <description>&lt;P&gt;Hi @Swapnil Kamle​,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.&lt;/P&gt;</description>
      <pubDate>Fri, 24 Feb 2023 23:28:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-tables-data-quality-rules-application/m-p/10874#M5928</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2023-02-24T23:28:04Z</dc:date>
    </item>
  </channel>
</rss>

