<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Understanding Photon Row Group Skipping in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/understanding-photon-row-group-skipping/m-p/106884#M9688</link>
    <description>&lt;P&gt;Hey guys!&lt;/P&gt;&lt;P&gt;I am using Photon to do a simple point query on a Liquid Clustered table with the purpose of understanding the statistics.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I see that a significant number of files have been pruned (`&lt;SPAN&gt;files pruned&lt;/SPAN&gt;`: 1104, `files read`:&lt;SPAN&gt;files read&lt;/SPAN&gt;).&lt;/P&gt;&lt;P&gt;However I am not sure I understand what is happening at the row group level. Here are some statistics from Spark UI:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screenshot 2025-01-24 at 10.07.05.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/14310i27CE6C3F8B76D276/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Screenshot 2025-01-24 at 10.07.05.png" alt="Screenshot 2025-01-24 at 10.07.05.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt; What does "row groups skipped via lazy materialization" mean? Are the rows actually read or not? There is clearly filtering happening at the row or row group level but I don't understand how this works in this simple case.&lt;/P&gt;&lt;P&gt;Thoughts?&lt;/P&gt;</description>
    <pubDate>Fri, 24 Jan 2025 09:09:28 GMT</pubDate>
    <dc:creator>tomvogel01</dc:creator>
    <dc:date>2025-01-24T09:09:28Z</dc:date>
    <item>
      <title>Understanding Photon Row Group Skipping</title>
      <link>https://community.databricks.com/t5/get-started-discussions/understanding-photon-row-group-skipping/m-p/106884#M9688</link>
      <description>&lt;P&gt;Hey guys!&lt;/P&gt;&lt;P&gt;I am using Photon to do a simple point query on a Liquid Clustered table with the purpose of understanding the statistics.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I see that a significant number of files have been pruned (`&lt;SPAN&gt;files pruned&lt;/SPAN&gt;`: 1104, `files read`:&lt;SPAN&gt;files read&lt;/SPAN&gt;).&lt;/P&gt;&lt;P&gt;However I am not sure I understand what is happening at the row group level. Here are some statistics from Spark UI:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screenshot 2025-01-24 at 10.07.05.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/14310i27CE6C3F8B76D276/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Screenshot 2025-01-24 at 10.07.05.png" alt="Screenshot 2025-01-24 at 10.07.05.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt; What does "row groups skipped via lazy materialization" mean? Are the rows actually read or not? There is clearly filtering happening at the row or row group level but I don't understand how this works in this simple case.&lt;/P&gt;&lt;P&gt;Thoughts?&lt;/P&gt;</description>
      <pubDate>Fri, 24 Jan 2025 09:09:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/understanding-photon-row-group-skipping/m-p/106884#M9688</guid>
      <dc:creator>tomvogel01</dc:creator>
      <dc:date>2025-01-24T09:09:28Z</dc:date>
    </item>
    <item>
      <title>Re: Understanding Photon Row Group Skipping</title>
      <link>https://community.databricks.com/t5/get-started-discussions/understanding-photon-row-group-skipping/m-p/107719#M9689</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/145203"&gt;@tomvogel01&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;"row groups skipped via lazy materialization" refers to the process where certain row groups are not physically read into memory during query execution. This is due to the ability of Photon to perform filtering at the row group level, which means that if a row group does not contain any rows that satisfy the query conditions, it can be skipped entirely.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 30 Jan 2025 08:01:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/understanding-photon-row-group-skipping/m-p/107719#M9689</guid>
      <dc:creator>Sidhant07</dc:creator>
      <dc:date>2025-01-30T08:01:24Z</dc:date>
    </item>
  </channel>
</rss>

