โ08-17-2025 10:51 AM
Hi,
May someone please help me with only Points which should be part of High Level Design and Low Level Design when transfering Data from One Databricks account to Another databricks account using Unity Catalog. First time full data transfer and than only Incremental Load.
Please help me with points which all should be part of HLD and than LLD.
Thanks a lot.
a month ago
Again, if your data is written to account A the changes should be visible nearly in real-time in account B. This discussion won't make any sense if you don't dedicate 1 hour of your life and read about fundamental concepts. So , please watch YT videos that I've provided or read documentation.
โ08-17-2025 11:56 AM - edited โ08-17-2025 12:03 PM
Hi @Datalight ,
In this scenario (account to account and I'm assuming that they are at different metastores ) it's recommended to use Delta Sharing. Key features of delta sharing:
You can read about it here:
What is Delta Sharing? - Azure Databricks | Microsoft Learn
You can read about how to setup delta sharing at below link:
Set up Delta Sharing for your account (for providers) - Azure Databricks | Microsoft Learn
Regarding incremental loading, Delta Sharing support for sharing the Change Data Feed for Delta tables. This is an excellent way for data recipients to keep track of incremental changes as they occur by the data provider. Data recipients may now read only the changes that have been made to a table, rather than having to re-read the entire dataset to get the latest snapshot.
You read about it at below article:
Use Delta Lake change data feed on Azure Databricks - Azure Databricks | Microsoft Learn
Regarding high level design - it would involve:
On Account A (Provider)
- Create a share and add the tables you want to share
- Create a recipient (if databricks-to-databricks): either create recipient object or let the recipient request access and you approve. You can set it to a particular workspace or external identity. See docs for exact steps.
- Grant the recipient USE on the share (or accept their request). The recipient will receive access to the live table metadata and data through Delta Sharing.
On Account B (Recipient)
- Connect to the provider share (Catalog Explorer โ Delta Sharing โ Add provider or accept provider invite). This mounts the provider share as a read-only catalog and you can query as a table.
For low level design just refer to documentation, there's no better source:
Reference: Solved: Re: Data Transfer using Unity Catalog full impleme... - Databricks Community - 128218
a month ago
@szymon_dybczak : Thanks a lot. May you please help me how can Persists data On Account B (Recipient).
I have to push data to Recipient ADLS Gen2.
Please share your thoughts.
a month ago
Do I need to execute query for Inserting data through Orchestrator tool, either ADF or Databricks workflow.
Kindly share your thoughts.
a month ago
The whole idea of delta sharing is that you write your data into your own account and create a share. Then the recipient B can read from that share.
Maybe try to watch below video to grasp the general idea:
(235) Delta Sharing in Action: Architecture and Best Practices - YouTube
a month ago
@szymon_dybczak : Thanks : Is there any way to do write operation on local of recipient. Pardon me it is sound illogical.
Many Thanks
a month ago
If I understood your question correctly, you're asking if recipient of a share can write/update data on share? If so, unfortunately this is not possible.
So, you have an access to all the data that provider (let's say account A) has shared with you. Any update that provider A will do to that data will be available for receipient (B) in near-real time. But Recipient cannot modify shared data itself.
.
a month ago
@szymon_dybczak : Hi,
Here A and B are two different account of Databricks.
Whenever new Data came to A (ADLS Gen2) , Automatically it should push the new Data to B (ADLS Gen2).
Both A and B are UC enable.
May you please help me with detail step, how can I achieve this.
Do I need to Orchestrate the pipeline with Databricks workflow or ADF.
Kindly share your thoughts.
a month ago
Again, if your data is written to account A the changes should be visible nearly in real-time in account B. This discussion won't make any sense if you don't dedicate 1 hour of your life and read about fundamental concepts. So , please watch YT videos that I've provided or read documentation.
a month ago
Take a look at "delta sharing" and "deep cloning". I implemented a kind of solution for Disaster Recovery between regions using those features, in my case under same account but that could be helpful in your case as well. Take into account that DEEP CLONE works incrementally. KR.
a month ago
Based on my previous reply, you can use DEEP CLONE to clone data incrementally between workspaces by including it in a scheduled job but this will not work in real time indeed.
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now