cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

GeneratedAlwaysAs' along with dataframe.write

thushar
Contributor

Is it possible to use a calculated column (as like in the delta table using generatedAlwaysAs) definition while writing the data frame as a delta file like df.write.format("delta").

Any options are there with the dataframe.write method to achieve this ?

6 REPLIES 6

Ajay-Pandey
Esteemed Contributor III

Hi @Thushar Rโ€‹ in my knowledge currently we don't have any option in databricks.

Ajay Kumar Pandey

Hubert-Dudek
Esteemed Contributor III

Yes, you can as GeneratedAlwaysAs is in delta schema, for example, write to the table (which is registered delta as the table).

df.write.format("delta").mode('append').saveAsTable('TableName')

some more examples are there https://docs.delta.io/latest/delta-batch.html there are a lot of ways to achieve that

DeltaTable.create(spark) \
  .tableName("default.events") \
  .addColumn("eventId", "BIGINT") \
  .addColumn("data", "STRING") \
  .addColumn("eventType", "STRING") \
  .addColumn("eventTime", "TIMESTAMP") \
  .addColumn("year", "INT", generatedAlwaysAs="YEAR(eventTime)") \
  .addColumn("month", "INT", generatedAlwaysAs="MONTH(eventTime)") \
  .addColumn("day", "INT", generatedAlwaysAs="DAY(eventTime)") \
  .partitionedBy("eventType", "year", "month", "day") \
  .execute()

Thanks, but my ask is whether we have the option to mention the clause 'generatedAlwaysAs' with dataframe.write method?

Hubert-Dudek
Esteemed Contributor III

Maybe try something like

df.createOrReplaceTempView("my_table")
 
spark.sql("CREATE TABLE rectangles(
  a INT,
  b INT,
  area INT GENERATED ALWAYS AS (a * b)
);
INSERT INTO rectangles (a, b)
select
  a,
  b
FROM
  my_table)

jose_gonzalez
Databricks Employee
Databricks Employee

Hi @Thushar Rโ€‹,

Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

pvignesh92
Honored Contributor

Hi @Thushar Rโ€‹ ,

This option is not a part of Dataframe write API as GeneratedAlwaysAs feature is only applicable to Delta format and df.write is a common API to handle writes for all formats.

If you to achieve this programmatically, you can still use DeltaTable API to create the table first as below

DeltaTable.create(spark) \

.tableName("default.events") \

.addColumn("eventId", "BIGINT") \

.addColumn("data", "STRING") \

.addColumn("eventType", "STRING") \

.addColumn("eventTime", "TIMESTAMP") \

.addColumn("year", "INT", generatedAlwaysAs="YEAR(eventTime)") \

.addColumn("month", "INT", generatedAlwaysAs="MONTH(eventTime)") \

.addColumn("day", "INT", generatedAlwaysAs="DAY(eventTime)") \

.partitionedBy("eventType", "year", "month", "day") \

.execute()

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group