Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @databrciks,

You can achieve this in multiple ways, as shown below.. All of these patterns work. They do a copy-on-write overwrite which means Databricks writes new files and then atomically commits a new table version. There is never a committed state where the table is empty. So, this is the safest option.

(
  df  
  .write
  .format("delta")
  .mode("overwrite")          # full refresh
  .option("overwriteSchema", "true")   # only if schema can change
  .saveAsTable("prod.report_table")
)
INSERT OVERWRITE TABLE prod.report_table
SELECT ...
FROM source_...;
CREATE OR REPLACE TABLE prod.report_table AS
SELECT ...
FROM source_...;

What you don't want to be doing is truncate and load as shown below.

TRUNCATE TABLE prod.report_table;
INSERT INTO prod.report_table SELECT ...;

This will be an anti-pattern because between TRUNCATE commit and INSERT commit, any new query sees an empty table.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

View solution in original post