Monday
Hello Experts
How to load everyday full load to delta table(truncate and load). The reports are pointing to this table.
How to ensure the zero downtime while load is on.
Thanks
Monday
@databrciks Load your new data into a staging table, then replace the target using CREATE OR REPLACE TABLE or INSERT OVERWRITE.
Delta Lake writes are atomic, so reports keep reading the old data until the new load finishes. There is no downtime and no empty table window. Once the load commits, all users automatically see the new data.
Tuesday
This situation is now elegantly solved using Multi Table Transactions in Databricks. Wrap the Truncate & Load logic in an ATOMIC block and you can achieve beyond single table consistency. This ensures that even if you are performing a truncate and load, report readers see the version before it.
Reports will not encounter an empty table or a partial state. Reports will see the valid version after truncate & full load is done. If the load fails, the transaction rolls back automatically leaving the reports stable.
More details here
Tuesday
Hi @databrciks,
You can achieve this in multiple ways, as shown below.. All of these patterns work. They do a copy-on-write overwrite which means Databricks writes new files and then atomically commits a new table version. There is never a committed state where the table is empty. So, this is the safest option.
(
df
.write
.format("delta")
.mode("overwrite") # full refresh
.option("overwriteSchema", "true") # only if schema can change
.saveAsTable("prod.report_table")
)INSERT OVERWRITE TABLE prod.report_table
SELECT ...
FROM source_...;CREATE OR REPLACE TABLE prod.report_table AS
SELECT ...
FROM source_...;
What you don't want to be doing is truncate and load as shown below.
TRUNCATE TABLE prod.report_table;
INSERT INTO prod.report_table SELECT ...;
This will be an anti-pattern because between TRUNCATE commit and INSERT commit, any new query sees an empty table.
If this answer resolves your question, could you mark it as โAccept as Solutionโ? That helps other users quickly find the correct fix.
Monday
@databrciks Load your new data into a staging table, then replace the target using CREATE OR REPLACE TABLE or INSERT OVERWRITE.
Delta Lake writes are atomic, so reports keep reading the old data until the new load finishes. There is no downtime and no empty table window. Once the load commits, all users automatically see the new data.
Tuesday
This situation is now elegantly solved using Multi Table Transactions in Databricks. Wrap the Truncate & Load logic in an ATOMIC block and you can achieve beyond single table consistency. This ensures that even if you are performing a truncate and load, report readers see the version before it.
Reports will not encounter an empty table or a partial state. Reports will see the valid version after truncate & full load is done. If the load fails, the transaction rolls back automatically leaving the reports stable.
More details here
Tuesday
Hi @databrciks,
You can achieve this in multiple ways, as shown below.. All of these patterns work. They do a copy-on-write overwrite which means Databricks writes new files and then atomically commits a new table version. There is never a committed state where the table is empty. So, this is the safest option.
(
df
.write
.format("delta")
.mode("overwrite") # full refresh
.option("overwriteSchema", "true") # only if schema can change
.saveAsTable("prod.report_table")
)INSERT OVERWRITE TABLE prod.report_table
SELECT ...
FROM source_...;CREATE OR REPLACE TABLE prod.report_table AS
SELECT ...
FROM source_...;
What you don't want to be doing is truncate and load as shown below.
TRUNCATE TABLE prod.report_table;
INSERT INTO prod.report_table SELECT ...;
This will be an anti-pattern because between TRUNCATE commit and INSERT commit, any new query sees an empty table.
If this answer resolves your question, could you mark it as โAccept as Solutionโ? That helps other users quickly find the correct fix.