Databricks Community

thushar · ‎01-23-2023

Is it possible to use a calculated column (as like in the delta table using generatedAlwaysAs) definition while writing the data frame as a delta file like df.write.format("delta").

Any options are there with the dataframe.write method to achieve this ?

Ajay-Pandey · ‎01-23-2023

Hi @Thushar R in my knowledge currently we don't have any option in databricks.

Ajay Kumar Pandey

Hubert-Dudek · ‎01-23-2023

Yes, you can as GeneratedAlwaysAs is in delta schema, for example, write to the table (which is registered delta as the table).

df.write.format("delta").mode('append').saveAsTable('TableName')

some more examples are there https://docs.delta.io/latest/delta-batch.html there are a lot of ways to achieve that

DeltaTable.create(spark) \
  .tableName("default.events") \
  .addColumn("eventId", "BIGINT") \
  .addColumn("data", "STRING") \
  .addColumn("eventType", "STRING") \
  .addColumn("eventTime", "TIMESTAMP") \
  .addColumn("year", "INT", generatedAlwaysAs="YEAR(eventTime)") \
  .addColumn("month", "INT", generatedAlwaysAs="MONTH(eventTime)") \
  .addColumn("day", "INT", generatedAlwaysAs="DAY(eventTime)") \
  .partitionedBy("eventType", "year", "month", "day") \
  .execute()

thushar · ‎01-23-2023

Thanks, but my ask is whether we have the option to mention the clause 'generatedAlwaysAs' with dataframe.write method?

Hubert-Dudek · ‎01-23-2023

Maybe try something like

df.createOrReplaceTempView("my_table")
 
spark.sql("CREATE TABLE rectangles(
  a INT,
  b INT,
  area INT GENERATED ALWAYS AS (a * b)
);
INSERT INTO rectangles (a, b)
select
  a,
  b
FROM
  my_table)

jose_gonzalez · ‎02-24-2023

Hi @Thushar R,

Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

pvignesh92 · ‎03-09-2023

Hi @Thushar R ,

This option is not a part of Dataframe write API as GeneratedAlwaysAs feature is only applicable to Delta format and df.write is a common API to handle writes for all formats.

If you to achieve this programmatically, you can still use DeltaTable API to create the table first as below

DeltaTable.create(spark) \

.tableName("default.events") \

.addColumn("eventId", "BIGINT") \

.addColumn("data", "STRING") \

.addColumn("eventType", "STRING") \

.addColumn("eventTime", "TIMESTAMP") \

.addColumn("year", "INT", generatedAlwaysAs="YEAR(eventTime)") \

.addColumn("month", "INT", generatedAlwaysAs="MONTH(eventTime)") \

.addColumn("day", "INT", generatedAlwaysAs="DAY(eventTime)") \

.partitionedBy("eventType", "year", "month", "day") \

.execute()