Let’s say we have big data application where data loss is not an option.
Having GZRS (geo-zone-redundant storage) redundancy we would achieve zero data loss if primary region is alive – writer is waiting for acks from two or more Azure availability zones in the primary region (if only one zone is available – write will not be successful). Once data is being written to the primary region it is being copied asynchronously to the secondary region, - hence when primary is down (flood/asteroid/you-name-it outage) there is a possibility of losing the data.
Microsoft states that data is being copied asynchronously, however it seems like order of files is not guaranteed to be preserved. That means if your primary region is down and you are failing over on secondary – there is high probability that some/most of your delta tables would be inconsistent. Imagine delta-log files were successfully copied to the secondary, but some parquet files are missing – i.e. table is inconsistent or out-of-sync – reads would result into error. Or your parquet files are copied, but delta-log is not there yet – congratulations – you have lost some data and you would not even notice it as reads would be successful (i.e. silent data-loss).
It also seems like RA-GRS (read-access geo-zone-redundant storage) does not play well with Delta Lake due to the same issue with eventual consistency…
And this is not the whole picture yet. Microsoft states that there is Last-Sync-Time property that indicates the most recent time that data from the primary region is guaranteed to have been written to the secondary region. And rumors have it – we should not trust that property as well, as it is unreliable…
Databricks mentions several times in the documentation that Deep Copy functionality should be used to copy deltas table from primary to secondary regions in a consistent way. Sounds good in theory – does not however cover many of the streaming cases.
Let’s take for instance stateless delta-to-delta streaming. Deep Copy does not maintain history when data is being copied (for all tables, not only streaming) – hence some additional work is required on primary to map processed offsets with source table history to be able to re-start on secondary from the new (and correct) offset. Add to that use case stateful streams, delta-to-delta with multiple sources and so on.
Is there any straightforward way to minimize data-loss when failing over from primary to secondary? Has someone managed to implement successful primary-to-secondary failover?