04-28-2022 12:15 PM
05-07-2022 10:35 AM
@Ryan Hager , yes it is possible using AUTO GENERATED COLUMNS since delta lake 1.2
For example, you can automatically generate a date column (for partitioning the table by date) from the timestamp column; any writes into the table need only specify the data for the timestamp column.
(DeltaTable.create(spark)
.tableName("default.people10m")
.addColumn("id", "INT")
.addColumn("birthDate", "TIMESTAMP")
.addColumn("dateOfBirth", DateType(), generatedAlwaysAs="CAST(birthDate AS DATE)")
.partitionedBy("dateOfBirth")
.execute())
05-13-2022 06:44 AM
Does this mean the execution plan for the following query that uses the original timestamp column will only scan 3 partitions and we don't have to use the dateOfBirth column in the where clause?
select id,birthDate from default.people10m
where birthDate > cast('2022-05-01 08:00:00.000000 America/Chicago' as timestamp)
and birthDate < cast('2022-05-03 08:00:00.000000 America/Chicago' as timestamp)
08-19-2022 09:26 AM
@Kaniz Fatma Can you help me get clarification on this?
02-27-2023 06:59 AM
Just to update the post, this does work:
05-13-2022 06:05 AM
Hi @Ryan Hager , Just a friendly follow-up. Do you still need help, or @Hubert Dudek (Customer) 's response help you to find the solution? Please let us know.
07-11-2022 05:45 PM
Clarification is still open.
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.