cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Are there any plans to add functions on the partition by fields of a delta table definition such as day() ? A similar capability exists in iceberg.

RyanHager
Contributor

Benefit: This will help simplify the where clauses of the consumers of the tables? Just query on the main date field if I need all the data for a day. Not an extra day field we had to make.

6 REPLIES 6

Hubert-Dudek
Esteemed Contributor III

@Ryan Hager​ , yes it is possible using AUTO GENERATED COLUMNS since delta lake 1.2

For example, you can automatically generate a date column (for partitioning the table by date) from the timestamp column; any writes into the table need only specify the data for the timestamp column.

(DeltaTable.create(spark)
.tableName("default.people10m") 
 .addColumn("id", "INT") 
 .addColumn("birthDate", "TIMESTAMP") 
 .addColumn("dateOfBirth", DateType(), generatedAlwaysAs="CAST(birthDate AS DATE)") 
 .partitionedBy("dateOfBirth") 
 .execute())

Does this mean the execution plan for the following query that uses the original timestamp column will only scan 3 partitions and we don't have to use the dateOfBirth column in the where clause?

select id,birthDate from default.people10m
where birthDate  > cast('2022-05-01 08:00:00.000000 America/Chicago' as timestamp)
and birthDate  < cast('2022-05-03 08:00:00.000000 America/Chicago' as timestamp)

@Kaniz Fatma​ Can you help me get clarification on this?

Just to update the post, this does work:

  • You have a timestamp column
  • Generate a date column from the timestamp column
  • Partition on that generated date column
  • Write a query that filters on the original timestamp column
    • Databricks will only scan partitions within the date range.

Kaniz
Community Manager
Community Manager

Hi @Ryan Hager​ ​ , Just a friendly follow-up. Do you still need help, or @Hubert Dudek (Customer)​ 's response help you to find the solution? Please let us know.

Clarification is still open.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.