cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Understanding Photon Row Group Skipping

tomvogel01
New Contributor II

Hey guys!

I am using Photon to do a simple point query on a Liquid Clustered table with the purpose of understanding the statistics. 

I see that a significant number of files have been pruned (`files pruned`: 1104, `files read`:files read).

However I am not sure I understand what is happening at the row group level. Here are some statistics from Spark UI:

Screenshot 2025-01-24 at 10.07.05.png

โ€ƒWhat does "row groups skipped via lazy materialization" mean? Are the rows actually read or not? There is clearly filtering happening at the row or row group level but I don't understand how this works in this simple case.

Thoughts?

1 REPLY 1

Sidhant07
Databricks Employee
Databricks Employee

Hi @tomvogel01 ,

"row groups skipped via lazy materialization" refers to the process where certain row groups are not physically read into memory during query execution. This is due to the ability of Photon to perform filtering at the row group level, which means that if a row group does not contain any rows that satisfy the query conditions, it can be skipped entirely.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group