cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Genrated partition column not being used by optimizer

AlexDavies
Contributor

We have created a table using the new generated column feature (https://docs.microsoft.com/en-us/azure/databricks/delta/delta-batch#deltausegeneratedcolumns)

CREATE TABLE ingest.MyEvent(
 data  binary,
 topic string,
 timestamp timestamp,
 date date GENERATED ALWAYS AS (CAST(timestamp AS DATE))
)
USING DELTA
PARTITIONED BY (topic,date)
LOCATION '/mnt/data/myevent'

When running

SELECT * FROM WHERE timestamp > '2021-06-01T12:34'

We were expecting it to push down a filter "date >= '2021-06-01'" so that it can make use of the partitioned column without having to explicit query it. However it does not push down a filter on date, so doesn't benefit from partition filtering

Is it expected that generated columns would be used to optimize queries? Is it possible to make generated columns work the way i'm expecting or am I expecting too much?

1 REPLY 1

-werners-
Esteemed Contributor III

I think you have to pass a date in your select query instead of a timestamp.

The generated column will indeed derive a data from the timestamp and partition by it.

But the docs state:

When you write to a table with generated columns and you do not explicitly provide values for them, Delta Lake automatically computes the values.

So the date column will be filled while writing. But afaik it is not the case with reading.

So "Select timestamp" will not be translated to "Select date".

When you query on the timestamp column, spark will read the timestamp column, and the table is not partitioned on this.

try this f.e.:
SELECT * FROM WHERE date  = '2021-06-01'

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.