โ01-23-2023 12:41 AM
Is it possible to use a calculated column (as like in the delta table using generatedAlwaysAs) definition while writing the data frame as a delta file like df.write.format("delta").
Any options are there with the dataframe.write method to achieve this ?
โ01-23-2023 03:05 AM
Hi @Thushar Rโ in my knowledge currently we don't have any option in databricks.
โ01-23-2023 05:17 AM
Yes, you can as GeneratedAlwaysAs is in delta schema, for example, write to the table (which is registered delta as the table).
df.write.format("delta").mode('append').saveAsTable('TableName')
some more examples are there https://docs.delta.io/latest/delta-batch.html there are a lot of ways to achieve that
DeltaTable.create(spark) \
.tableName("default.events") \
.addColumn("eventId", "BIGINT") \
.addColumn("data", "STRING") \
.addColumn("eventType", "STRING") \
.addColumn("eventTime", "TIMESTAMP") \
.addColumn("year", "INT", generatedAlwaysAs="YEAR(eventTime)") \
.addColumn("month", "INT", generatedAlwaysAs="MONTH(eventTime)") \
.addColumn("day", "INT", generatedAlwaysAs="DAY(eventTime)") \
.partitionedBy("eventType", "year", "month", "day") \
.execute()
โ01-23-2023 05:29 AM
Thanks, but my ask is whether we have the option to mention the clause 'generatedAlwaysAs' with dataframe.write method?
โ01-23-2023 05:43 AM
Maybe try something like
df.createOrReplaceTempView("my_table")
spark.sql("CREATE TABLE rectangles(
a INT,
b INT,
area INT GENERATED ALWAYS AS (a * b)
);
INSERT INTO rectangles (a, b)
select
a,
b
FROM
my_table)
โ02-24-2023 03:43 PM
Hi @Thushar Rโ,
Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.
โ03-09-2023 06:27 AM
Hi @Thushar Rโ ,
This option is not a part of Dataframe write API as GeneratedAlwaysAs feature is only applicable to Delta format and df.write is a common API to handle writes for all formats.
If you to achieve this programmatically, you can still use DeltaTable API to create the table first as below
DeltaTable.create(spark) \
.tableName("default.events") \
.addColumn("eventId", "BIGINT") \
.addColumn("data", "STRING") \
.addColumn("eventType", "STRING") \
.addColumn("eventTime", "TIMESTAMP") \
.addColumn("year", "INT", generatedAlwaysAs="YEAR(eventTime)") \
.addColumn("month", "INT", generatedAlwaysAs="MONTH(eventTime)") \
.addColumn("day", "INT", generatedAlwaysAs="DAY(eventTime)") \
.partitionedBy("eventType", "year", "month", "day") \
.execute()
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group