cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Operations applied when running fs.write_table to overwrite existing feature table in hive metastore

Direo
Contributor

Hi,

there was a need to query an older snapshot of a table. Therefore ran:

deltaTable = DeltaTable.forPath(spark, 'dbfs:/<path>')

display(deltaTable.history())

and noticed that every fs.write_table run triggers two operations:

Write and CREATE OR REPLACE TABLE AS SELECT. In both cases operation mode is "append".

imageWould be interesting to know why two operations are triggered and what does WRITE operation do?

1 REPLY 1

Anonymous
Not applicable

@Direo Direo​ :

When you use deltaTable.write() method to write a DataFrame into a Delta table, it actually triggers the Delta write operation internally. This operation performs two actions:

  1. It writes the new data to disk in the Delta format, and
  2. It atomically updates the table metadata in the transaction log.

The CREATE OR REPLACE TABLE AS SELECT statement is used to create or replace a table with the data returned by a query. In Delta Lake, this statement is used to create or replace a Delta table with the results of a query.

The WRITE operation that you see in the Delta table history corresponds to the first action of the Delta

write operation: writing the new data to disk. This operation is recorded in the transaction log and can be used to replay the transaction in case of a failure.

So, the WRITE operation records the actual data being written to the Delta table, while the CREATE OR REPLACE TABLE AS SELECT statement records the metadata update for the Delta table.

In summary, when you write to a Delta table, two operations are triggered: WRITE to write the actual data to disk, and CREATE OR REPLACE TABLE AS SELECT to update the table metadata in the transaction log.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group