Are there any plans to add functions on the partition by fields of a delta table definition such as day() ? A similar capability exists in iceberg.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ04-28-2022 12:15 PM
Benefit: This will help simplify the where clauses of the consumers of the tables? Just query on the main date field if I need all the data for a day. Not an extra day field we had to make.
- Labels:
-
Delta
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ05-07-2022 10:35 AM
@Ryan Hagerโ , yes it is possible using AUTO GENERATED COLUMNS since delta lake 1.2
For example, you can automatically generate a date column (for partitioning the table by date) from the timestamp column; any writes into the table need only specify the data for the timestamp column.
(DeltaTable.create(spark)
.tableName("default.people10m")
.addColumn("id", "INT")
.addColumn("birthDate", "TIMESTAMP")
.addColumn("dateOfBirth", DateType(), generatedAlwaysAs="CAST(birthDate AS DATE)")
.partitionedBy("dateOfBirth")
.execute())
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ05-13-2022 06:44 AM
Does this mean the execution plan for the following query that uses the original timestamp column will only scan 3 partitions and we don't have to use the dateOfBirth column in the where clause?
select id,birthDate from default.people10m
where birthDate > cast('2022-05-01 08:00:00.000000 America/Chicago' as timestamp)
and birthDate < cast('2022-05-03 08:00:00.000000 America/Chicago' as timestamp)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ08-19-2022 09:26 AM
@Kaniz Fatmaโ Can you help me get clarification on this?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ02-27-2023 06:59 AM
Just to update the post, this does work:
- You have a timestamp column
- Generate a date column from the timestamp column
- Partition on that generated date column
- Write a query that filters on the original timestamp column
- Databricks will only scan partitions within the date range.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ07-11-2022 05:45 PM
Clarification is still open.

