cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Partition pruning with generated columns

andrej
New Contributor II

I have a large table which contains a date_time column.

The table contains 2 generated columns year, and month which are extracted from the date_time values and are used for partitioning.

I have the following question.

If I run the query

SELECT *

FROM table

WHERE date_time > '2022-07-01' and date_time < '2022-07-09'

This query will scan all the files

If I modify the query to

SELECT *

FROM table

WHERE date_time > '2022-07-01' and date_time < '2022-07-09'

AND year = 2022 and month = 7

Now pruning will get applied and the query will run ~ 20 times faster.

I would be expecting that given that there is a relationship defined between date_time and columns year and month, pruning would be applied even if only date_time is specified in the where clause.

Am I missing something in my config or is my understanding incorrect?

Thanks,

Andrej

4 REPLIES 4

Anonymous
Not applicable

Partition pruning will only happen when using the generated columns i.e. ‘year’ and ‘month’ as predicates.

You can consider file pruning by zordering or using bloom filter index.

-werners-
Esteemed Contributor III

no your understanding is correct.

However there are some restrictions, which you can find here (the interesting part starts at the paragraph starting with "In Databricks Runtime 8.4 and above with Photon support, Delta Lake may be able to generate partition filters...")

andrej
New Contributor II

Hi, thank you for replies.

@Werner Stinckens​ i read that exact article, but after re-reading it I realise that Photon support is required.

Will try again with that. Thanks!

Vidula
Honored Contributor

Hi @Andrej Znidarsic​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.