cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Genrated partition column not being used by optimizer

AlexDavies
Contributor

We have created a table using the new generated column feature (https://docs.microsoft.com/en-us/azure/databricks/delta/delta-batch#deltausegeneratedcolumns)

CREATE TABLE ingest.MyEvent(
 data  binary,
 topic string,
 timestamp timestamp,
 date date GENERATED ALWAYS AS (CAST(timestamp AS DATE))
)
USING DELTA
PARTITIONED BY (topic,date)
LOCATION '/mnt/data/myevent'

When running

SELECT * FROM WHERE timestamp > '2021-06-01T12:34'

We were expecting it to push down a filter "date >= '2021-06-01'" so that it can make use of the partitioned column without having to explicit query it. However it does not push down a filter on date, so doesn't benefit from partition filtering

Is it expected that generated columns would be used to optimize queries? Is it possible to make generated columns work the way i'm expecting or am I expecting too much?

1 REPLY 1

-werners-
Esteemed Contributor III

I think you have to pass a date in your select query instead of a timestamp.

The generated column will indeed derive a data from the timestamp and partition by it.

But the docs state:

When you write to a table with generated columns and you do not explicitly provide values for them, Delta Lake automatically computes the values.

So the date column will be filled while writing. But afaik it is not the case with reading.

So "Select timestamp" will not be translated to "Select date".

When you query on the timestamp column, spark will read the timestamp column, and the table is not partitioned on this.

try this f.e.:
SELECT * FROM WHERE date  = '2021-06-01'

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group