Overwriting the existing table in Databricks; Mech...

Mado · ‎03-08-2023

Hi,

Assume that I have a delta table stored on an Azure storage account. When new records arrive, I repeat the transformation and overwrite the existing table.

    (DF.write
 
         .format("delta")
 
         .mode("overwrite")
 
         .option("path", save_path)
 
         .save()

I have 2 questions in this regard:

1. What is the mechanism of overwriting?

Does it truncate the table and insert new records?

2. If any overwriting operation fails, how can I know that?

Assume that dataset is large and overwriting cannot be identified by looking at the table records.

Is there any log or history that shows whether the latest overwrite was successful?

-werners- · ‎03-09-2023

the overwrite will add new files, keep the old ones and in a log keeps track of what is current data and what is old data.

If the overwrite fails, you will get an error message in the spark program, and the data to be overwritten will still be the current state.

Mado · ‎03-09-2023

Thanks.

Is it explained in the documentation? Could you share with me if there are any?

Overwriting the existing table in Databricks; Mechanism and History?