cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

GeneratedAlwaysAs' along with dataframe.write

thushar
Contributor

Is it possible to use a calculated column (as like in the delta table using generatedAlwaysAs) definition while writing the data frame as a delta file like df.write.format("delta").

Any options are there with the dataframe.write method to achieve this ?

6 REPLIES 6

Ajay-Pandey
Esteemed Contributor II

Hi @Thushar R​ in my knowledge currently we don't have any option in databricks.

Hubert-Dudek
Esteemed Contributor III

Yes, you can as GeneratedAlwaysAs is in delta schema, for example, write to the table (which is registered delta as the table).

df.write.format("delta").mode('append').saveAsTable('TableName')

some more examples are there https://docs.delta.io/latest/delta-batch.html there are a lot of ways to achieve that

DeltaTable.create(spark) \
  .tableName("default.events") \
  .addColumn("eventId", "BIGINT") \
  .addColumn("data", "STRING") \
  .addColumn("eventType", "STRING") \
  .addColumn("eventTime", "TIMESTAMP") \
  .addColumn("year", "INT", generatedAlwaysAs="YEAR(eventTime)") \
  .addColumn("month", "INT", generatedAlwaysAs="MONTH(eventTime)") \
  .addColumn("day", "INT", generatedAlwaysAs="DAY(eventTime)") \
  .partitionedBy("eventType", "year", "month", "day") \
  .execute()

Thanks, but my ask is whether we have the option to mention the clause 'generatedAlwaysAs' with dataframe.write method?

Hubert-Dudek
Esteemed Contributor III

Maybe try something like

df.createOrReplaceTempView("my_table")
 
spark.sql("CREATE TABLE rectangles(
  a INT,
  b INT,
  area INT GENERATED ALWAYS AS (a * b)
);
INSERT INTO rectangles (a, b)
select
  a,
  b
FROM
  my_table)

jose_gonzalez
Moderator
Moderator

Hi @Thushar R​,

Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

pvignesh92
Honored Contributor

Hi @Thushar R​ ,

This option is not a part of Dataframe write API as GeneratedAlwaysAs feature is only applicable to Delta format and df.write is a common API to handle writes for all formats.

If you to achieve this programmatically, you can still use DeltaTable API to create the table first as below

DeltaTable.create(spark) \

.tableName("default.events") \

.addColumn("eventId", "BIGINT") \

.addColumn("data", "STRING") \

.addColumn("eventType", "STRING") \

.addColumn("eventTime", "TIMESTAMP") \

.addColumn("year", "INT", generatedAlwaysAs="YEAR(eventTime)") \

.addColumn("month", "INT", generatedAlwaysAs="MONTH(eventTime)") \

.addColumn("day", "INT", generatedAlwaysAs="DAY(eventTime)") \

.partitionedBy("eventType", "year", "month", "day") \

.execute()

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.