<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Partition pruning with generated columns in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/partition-pruning-with-generated-columns/m-p/13510#M8183</link>
    <description>&lt;P&gt;Partition pruning will only happen when using the generated columns i.e. ‘year’ and ‘month’ as predicates.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You can consider file pruning by zordering or using bloom filter index.&lt;/P&gt;</description>
    <pubDate>Thu, 14 Jul 2022 14:56:26 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2022-07-14T14:56:26Z</dc:date>
    <item>
      <title>Partition pruning with generated columns</title>
      <link>https://community.databricks.com/t5/data-engineering/partition-pruning-with-generated-columns/m-p/13509#M8182</link>
      <description>&lt;P&gt;I have a large table which contains a date_time column.&lt;/P&gt;&lt;P&gt;The table contains 2 generated columns year, and month which are extracted from the date_time values and are used for partitioning.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have the following question.&lt;/P&gt;&lt;P&gt;If I run the query&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;SELECT *&lt;/P&gt;&lt;P&gt;FROM table&lt;/P&gt;&lt;P&gt;WHERE date_time &amp;gt; '2022-07-01' and date_time &amp;lt; '2022-07-09'&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This query will scan all the files&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If I modify the query to&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;SELECT *&lt;/P&gt;&lt;P&gt;FROM table&lt;/P&gt;&lt;P&gt;WHERE date_time &amp;gt; '2022-07-01' and date_time &amp;lt; '2022-07-09'&lt;/P&gt;&lt;P&gt;AND year = 2022 and month = 7&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Now pruning will get applied and the query will run ~ 20 times faster.&lt;/P&gt;&lt;P&gt;I would be expecting that given that there is a relationship defined between date_time and columns year and month, pruning would be applied even if only date_time is specified in the where clause.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Am I missing something in my config or is my understanding incorrect?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Andrej&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 14 Jul 2022 14:28:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/partition-pruning-with-generated-columns/m-p/13509#M8182</guid>
      <dc:creator>andrej</dc:creator>
      <dc:date>2022-07-14T14:28:02Z</dc:date>
    </item>
    <item>
      <title>Re: Partition pruning with generated columns</title>
      <link>https://community.databricks.com/t5/data-engineering/partition-pruning-with-generated-columns/m-p/13510#M8183</link>
      <description>&lt;P&gt;Partition pruning will only happen when using the generated columns i.e. ‘year’ and ‘month’ as predicates.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You can consider file pruning by zordering or using bloom filter index.&lt;/P&gt;</description>
      <pubDate>Thu, 14 Jul 2022 14:56:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/partition-pruning-with-generated-columns/m-p/13510#M8183</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-07-14T14:56:26Z</dc:date>
    </item>
    <item>
      <title>Re: Partition pruning with generated columns</title>
      <link>https://community.databricks.com/t5/data-engineering/partition-pruning-with-generated-columns/m-p/13511#M8184</link>
      <description>&lt;P&gt;no your understanding is correct.&lt;/P&gt;&lt;P&gt;However there are some restrictions, which you can find &lt;A href="https://docs.microsoft.com/en-us/azure/databricks/delta/delta-batch#scala-4" alt="https://docs.microsoft.com/en-us/azure/databricks/delta/delta-batch#scala-4" target="_blank"&gt;here&lt;/A&gt; (the interesting part starts at the paragraph starting with "&lt;I&gt;In Databricks Runtime 8.4 and above with Photon support, Delta Lake may be able to generate partition filters...&lt;/I&gt;")&lt;/P&gt;</description>
      <pubDate>Thu, 14 Jul 2022 14:58:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/partition-pruning-with-generated-columns/m-p/13511#M8184</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-07-14T14:58:11Z</dc:date>
    </item>
    <item>
      <title>Re: Partition pruning with generated columns</title>
      <link>https://community.databricks.com/t5/data-engineering/partition-pruning-with-generated-columns/m-p/13512#M8185</link>
      <description>&lt;P&gt;Hi, thank you for replies.&lt;/P&gt;&lt;P&gt;@Werner Stinckens​&amp;nbsp;i read that exact article, but after re-reading it I realise that Photon support is required.&lt;/P&gt;&lt;P&gt;Will try again with that. Thanks!&lt;/P&gt;</description>
      <pubDate>Thu, 14 Jul 2022 15:14:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/partition-pruning-with-generated-columns/m-p/13512#M8185</guid>
      <dc:creator>andrej</dc:creator>
      <dc:date>2022-07-14T15:14:25Z</dc:date>
    </item>
    <item>
      <title>Re: Partition pruning with generated columns</title>
      <link>https://community.databricks.com/t5/data-engineering/partition-pruning-with-generated-columns/m-p/13513#M8186</link>
      <description>&lt;P&gt;Hi @Andrej Znidarsic​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We'd love to hear from you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 04 Sep 2022 14:04:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/partition-pruning-with-generated-columns/m-p/13513#M8186</guid>
      <dc:creator>Vidula</dc:creator>
      <dc:date>2022-09-04T14:04:54Z</dc:date>
    </item>
  </channel>
</rss>

