Databricks Community

tomvogel01 · ‎01-24-2025

Hey guys!

I am using Photon to do a simple point query on a Liquid Clustered table with the purpose of understanding the statistics.

I see that a significant number of files have been pruned (`files pruned`: 1104, `files read`:files read).

However I am not sure I understand what is happening at the row group level. Here are some statistics from Spark UI:

Screenshot 2025-01-24 at 10.07.05.png

What does "row groups skipped via lazy materialization" mean? Are the rows actually read or not? There is clearly filtering happening at the row or row group level but I don't understand how this works in this simple case.

Thoughts?

Sidhant07 · ‎01-30-2025

Hi @tomvogel01 ,

"row groups skipped via lazy materialization" refers to the process where certain row groups are not physically read into memory during query execution. This is due to the ability of Photon to perform filtering at the row group level, which means that if a row group does not contain any rows that satisfy the query conditions, it can be skipped entirely.

Databricks Community

Understanding Photon Row Group Skipping

Photos

Connect with Databricks Users in Your Area

Data + AI Summit 2025 — registration now open!

Women’s Week Challenge: Play, Engage & Win Swag

Intelligent Data Warehousing: AI/BI for Self-service Analytics

Databricks DevConnect: Global Community Meetups for Data Engineers

Databricks Community Champion - February 2025 - Stefan Koch