cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to reduce data loss for Delta Lake on Azure when failing from primary to secondary regions?

YuriS
New Contributor II

Let’s say we have big data application where data loss is not an option.

Having GZRS (geo-zone-redundant storage) redundancy we would achieve zero data loss if primary region is alive – writer is waiting for acks from two or more Azure availability zones in the primary region (if only one zone is available – write will not be successful). Once data is being written to the primary region it is being copied asynchronously to the secondary region, - hence when primary is down (flood/asteroid/you-name-it outage) there is a possibility of losing the data.

Microsoft states that data is being copied asynchronously, however it seems like order of files is not guaranteed to be preserved. That means if your primary region is down and you are failing over on secondary – there is high probability that some/most of your delta tables would be inconsistent. Imagine delta-log files were successfully copied to the secondary, but some parquet files are missing – i.e. table is inconsistent or out-of-sync – reads would result into error. Or your parquet files are copied, but delta-log is not there yet – congratulations – you have lost some data and you would not even notice it as reads would be successful (i.e. silent data-loss).

It also seems like RA-GRS (read-access geo-zone-redundant storage) does not play well with Delta Lake due to the same issue with eventual consistency…

And this is not the whole picture yet. Microsoft states that there is Last-Sync-Time property that indicates the most recent time that data from the primary region is guaranteed to have been written to the secondary region. And rumors have it – we should not trust that property as well, as it is unreliable…

Databricks mentions several times in the documentation that Deep Copy functionality should be used to copy deltas table from primary to secondary regions in a consistent way. Sounds good in theory – does not however cover many of the streaming cases.

Let’s take for instance stateless delta-to-delta streaming. Deep Copy does not maintain history when data is being copied (for all tables, not only streaming) – hence some additional work is required on primary to map processed offsets with source table history to be able to re-start on secondary from the new (and correct) offset. Add to that use case stateful streams, delta-to-delta with multiple sources and so on.

Is there any straightforward way to minimize data-loss when failing over from primary to secondary? Has someone managed to implement successful primary-to-secondary failover?

1 REPLY 1

yoliti7727
Visitor

Hello!

You are correct that relying solely on Azure GZRS/RA-GRS leads to a high probability of Delta Lake inconsistency and potential silent data loss upon failover, because Azure's asynchronous, block-level replication does not guarantee the order or integrity of the Delta log and Parquet files. Since native cloud replication is insufficient, the most reliable strategy to minimize data loss is to implement active, application-level replication in the primary region. For batch tables, this means running scheduled Deep Copies to the secondary. For critical streaming data, the gold standard is to run a second, dedicated Structured Streaming job whose sole output sink is the secondary storage account, which mywisely ensures the consistency and commit order of the Delta tables. Upon failover, all writes must be stopped, a storage failover performed, and then a consistency check (like FSCK REPAIR TABLE) and a restart from a known safe offset must be executed to successfully recover the system. 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now