GeneratedAlwaysAs' along with dataframe.write
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-23-2023 12:41 AM
Is it possible to use a calculated column (as like in the delta table using generatedAlwaysAs) definition while writing the data frame as a delta file like df.write.format("delta").
Any options are there with the dataframe.write method to achieve this ?
- Labels:
-
Column
-
Create
-
Delta File
-
Delta table
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-23-2023 03:05 AM
Hi @Thushar R in my knowledge currently we don't have any option in databricks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-23-2023 05:17 AM
Yes, you can as GeneratedAlwaysAs is in delta schema, for example, write to the table (which is registered delta as the table).
df.write.format("delta").mode('append').saveAsTable('TableName')
some more examples are there https://docs.delta.io/latest/delta-batch.html there are a lot of ways to achieve that
DeltaTable.create(spark) \
.tableName("default.events") \
.addColumn("eventId", "BIGINT") \
.addColumn("data", "STRING") \
.addColumn("eventType", "STRING") \
.addColumn("eventTime", "TIMESTAMP") \
.addColumn("year", "INT", generatedAlwaysAs="YEAR(eventTime)") \
.addColumn("month", "INT", generatedAlwaysAs="MONTH(eventTime)") \
.addColumn("day", "INT", generatedAlwaysAs="DAY(eventTime)") \
.partitionedBy("eventType", "year", "month", "day") \
.execute()
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-23-2023 05:29 AM
Thanks, but my ask is whether we have the option to mention the clause 'generatedAlwaysAs' with dataframe.write method?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-23-2023 05:43 AM
Maybe try something like
df.createOrReplaceTempView("my_table")
spark.sql("CREATE TABLE rectangles(
a INT,
b INT,
area INT GENERATED ALWAYS AS (a * b)
);
INSERT INTO rectangles (a, b)
select
a,
b
FROM
my_table)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-24-2023 03:43 PM
Hi @Thushar R,
Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-09-2023 06:27 AM
Hi @Thushar R ,
This option is not a part of Dataframe write API as GeneratedAlwaysAs feature is only applicable to Delta format and df.write is a common API to handle writes for all formats.
If you to achieve this programmatically, you can still use DeltaTable API to create the table first as below
DeltaTable.create(spark) \
.tableName("default.events") \
.addColumn("eventId", "BIGINT") \
.addColumn("data", "STRING") \
.addColumn("eventType", "STRING") \
.addColumn("eventTime", "TIMESTAMP") \
.addColumn("year", "INT", generatedAlwaysAs="YEAR(eventTime)") \
.addColumn("month", "INT", generatedAlwaysAs="MONTH(eventTime)") \
.addColumn("day", "INT", generatedAlwaysAs="DAY(eventTime)") \
.partitionedBy("eventType", "year", "month", "day") \
.execute()

