01-23-2023 12:41 AM
Is it possible to use a calculated column (as like in the delta table using generatedAlwaysAs) definition while writing the data frame as a delta file like df.write.format("delta").
Any options are there with the dataframe.write method to achieve this ?
01-23-2023 03:05 AM
Hi @Thushar R in my knowledge currently we don't have any option in databricks.
01-23-2023 05:17 AM
Yes, you can as GeneratedAlwaysAs is in delta schema, for example, write to the table (which is registered delta as the table).
df.write.format("delta").mode('append').saveAsTable('TableName')
some more examples are there https://docs.delta.io/latest/delta-batch.html there are a lot of ways to achieve that
DeltaTable.create(spark) \
.tableName("default.events") \
.addColumn("eventId", "BIGINT") \
.addColumn("data", "STRING") \
.addColumn("eventType", "STRING") \
.addColumn("eventTime", "TIMESTAMP") \
.addColumn("year", "INT", generatedAlwaysAs="YEAR(eventTime)") \
.addColumn("month", "INT", generatedAlwaysAs="MONTH(eventTime)") \
.addColumn("day", "INT", generatedAlwaysAs="DAY(eventTime)") \
.partitionedBy("eventType", "year", "month", "day") \
.execute()
01-23-2023 05:29 AM
Thanks, but my ask is whether we have the option to mention the clause 'generatedAlwaysAs' with dataframe.write method?
01-23-2023 05:43 AM
Maybe try something like
df.createOrReplaceTempView("my_table")
spark.sql("CREATE TABLE rectangles(
a INT,
b INT,
area INT GENERATED ALWAYS AS (a * b)
);
INSERT INTO rectangles (a, b)
select
a,
b
FROM
my_table)
02-24-2023 03:43 PM
Hi @Thushar R,
Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.
03-09-2023 06:27 AM
Hi @Thushar R ,
This option is not a part of Dataframe write API as GeneratedAlwaysAs feature is only applicable to Delta format and df.write is a common API to handle writes for all formats.
If you to achieve this programmatically, you can still use DeltaTable API to create the table first as below
DeltaTable.create(spark) \
.tableName("default.events") \
.addColumn("eventId", "BIGINT") \
.addColumn("data", "STRING") \
.addColumn("eventType", "STRING") \
.addColumn("eventTime", "TIMESTAMP") \
.addColumn("year", "INT", generatedAlwaysAs="YEAR(eventTime)") \
.addColumn("month", "INT", generatedAlwaysAs="MONTH(eventTime)") \
.addColumn("day", "INT", generatedAlwaysAs="DAY(eventTime)") \
.partitionedBy("eventType", "year", "month", "day") \
.execute()
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group