cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Delta table zero downtime

databrciks
New Contributor III

Hello Experts

How to load everyday full load to delta table(truncate and load). The reports are pointing to this table.

How to ensure the zero downtime while load is on.

Thanks

3 ACCEPTED SOLUTIONS

Accepted Solutions

Sumit_7
Honored Contributor II

@databrciks Load your new data into a staging table, then replace the target using CREATE OR REPLACE TABLE or INSERT OVERWRITE.
Delta Lake writes are atomic, so reports keep reading the old data until the new load finishes. There is no downtime and no empty table window. Once the load commits, all users automatically see the new data.

View solution in original post

balajij8
Contributor

@databrciks 

This situation is now elegantly solved using Multi Table Transactions in Databricks. Wrap the Truncate & Load logic in an ATOMIC block and you can achieve beyond single table consistency. This ensures that even if you are performing a truncate and load, report readers see the version before it.

Reports will not encounter an empty table or a partial state. Reports will see the valid version after truncate & full load is done. If the load fails, the transaction rolls back automatically leaving the reports stable.

More details here

View solution in original post

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @databrciks,

You can achieve this in multiple ways, as shown below.. All of these patterns work. They do a copy-on-write overwrite which means Databricks writes new files and then atomically commits a new table version. There is never a committed state where the table is empty. So, this is the safest option.

(
  df  
  .write
  .format("delta")
  .mode("overwrite")          # full refresh
  .option("overwriteSchema", "true")   # only if schema can change
  .saveAsTable("prod.report_table")
)
INSERT OVERWRITE TABLE prod.report_table
SELECT ...
FROM source_...;
CREATE OR REPLACE TABLE prod.report_table AS
SELECT ...
FROM source_...;

What you don't want to be doing is truncate and load as shown below.

TRUNCATE TABLE prod.report_table;
INSERT INTO prod.report_table SELECT ...;

This will be an anti-pattern because between TRUNCATE commit and INSERT commit, any new query sees an empty table.

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

View solution in original post

3 REPLIES 3

Sumit_7
Honored Contributor II

@databrciks Load your new data into a staging table, then replace the target using CREATE OR REPLACE TABLE or INSERT OVERWRITE.
Delta Lake writes are atomic, so reports keep reading the old data until the new load finishes. There is no downtime and no empty table window. Once the load commits, all users automatically see the new data.

balajij8
Contributor

@databrciks 

This situation is now elegantly solved using Multi Table Transactions in Databricks. Wrap the Truncate & Load logic in an ATOMIC block and you can achieve beyond single table consistency. This ensures that even if you are performing a truncate and load, report readers see the version before it.

Reports will not encounter an empty table or a partial state. Reports will see the valid version after truncate & full load is done. If the load fails, the transaction rolls back automatically leaving the reports stable.

More details here

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @databrciks,

You can achieve this in multiple ways, as shown below.. All of these patterns work. They do a copy-on-write overwrite which means Databricks writes new files and then atomically commits a new table version. There is never a committed state where the table is empty. So, this is the safest option.

(
  df  
  .write
  .format("delta")
  .mode("overwrite")          # full refresh
  .option("overwriteSchema", "true")   # only if schema can change
  .saveAsTable("prod.report_table")
)
INSERT OVERWRITE TABLE prod.report_table
SELECT ...
FROM source_...;
CREATE OR REPLACE TABLE prod.report_table AS
SELECT ...
FROM source_...;

What you don't want to be doing is truncate and load as shown below.

TRUNCATE TABLE prod.report_table;
INSERT INTO prod.report_table SELECT ...;

This will be an anti-pattern because between TRUNCATE commit and INSERT commit, any new query sees an empty table.

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***