cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Partition pruning with generated columns

andrej
New Contributor II

I have a large table which contains a date_time column.

The table contains 2 generated columns year, and month which are extracted from the date_time values and are used for partitioning.

I have the following question.

If I run the query

SELECT *

FROM table

WHERE date_time > '2022-07-01' and date_time < '2022-07-09'

This query will scan all the files

If I modify the query to

SELECT *

FROM table

WHERE date_time > '2022-07-01' and date_time < '2022-07-09'

AND year = 2022 and month = 7

Now pruning will get applied and the query will run ~ 20 times faster.

I would be expecting that given that there is a relationship defined between date_time and columns year and month, pruning would be applied even if only date_time is specified in the where clause.

Am I missing something in my config or is my understanding incorrect?

Thanks,

Andrej

4 REPLIES 4

Anonymous
Not applicable

Partition pruning will only happen when using the generated columns i.e. ‘year’ and ‘month’ as predicates.

You can consider file pruning by zordering or using bloom filter index.

-werners-
Esteemed Contributor III

no your understanding is correct.

However there are some restrictions, which you can find here (the interesting part starts at the paragraph starting with "In Databricks Runtime 8.4 and above with Photon support, Delta Lake may be able to generate partition filters...")

andrej
New Contributor II

Hi, thank you for replies.

@Werner Stinckens​ i read that exact article, but after re-reading it I realise that Photon support is required.

Will try again with that. Thanks!

Vidula
Honored Contributor

Hi @Andrej Znidarsic​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group