Understanding Photon Row Group Skipping

tomvogel01 — Fri, 24 Jan 2025 09:09:28 GMT

Hey guys!

I am using Photon to do a simple point query on a Liquid Clustered table with the purpose of understanding the statistics.

I see that a significant number of files have been pruned (`files pruned`: 1104, `files read`:files read).

However I am not sure I understand what is happening at the row group level. Here are some statistics from Spark UI:

What does "row groups skipped via lazy materialization" mean? Are the rows actually read or not? There is clearly filtering happening at the row or row group level but I don't understand how this works in this simple case.

Thoughts?

Re: Understanding Photon Row Group Skipping

Sidhant07 — Thu, 30 Jan 2025 08:01:24 GMT

Hi @tomvogel01 ,

"row groups skipped via lazy materialization" refers to the process where certain row groups are not physically read into memory during query execution. This is due to the ability of Photon to perform filtering at the row group level, which means that if a row group does not contain any rows that satisfy the query conditions, it can be skipped entirely.

topic Understanding Photon Row Group Skipping in Get Started Discussions

Understanding Photon Row Group Skipping

Re: Understanding Photon Row Group Skipping