szymon_dybczak
Esteemed Contributor III

Hi @Datalight ,

In this scenario (account to account and I'm assuming that they are at different metastores )  it's recommended to use Delta Sharing. Key features of delta sharing:

  1. Open Protocol: Allows sharing data across different platforms, including Databricks, Snowflake, Apache Spark, and pandas.
  2. Real-time Data Access: Consumers always access the latest data without needing ETL pipelines or data duplication.
  3. Fine-Grained Access Control: With Unity Catalog, you can manage permissions at the catalog, schema, table, or even row level.
  4. Cross-Cloud Sharing: You can share data across different cloud providers (Azure, AWS, and GCP) or different databricks account.
  5. No Data Replication: Consumers query the shared Delta table directly from its storage location.

You can read about it here:

What is Delta Sharing? - Azure Databricks | Microsoft Learn

You can read about how to setup delta sharing at below link:

Set up Delta Sharing for your account (for providers) - Azure Databricks | Microsoft Learn

Regarding incremental loading, Delta Sharing support for sharing the Change Data Feed for Delta tables. This is an excellent way for data recipients to keep track of incremental changes as they occur by the data provider. Data recipients may now read only the changes that have been made to a table, rather than having to re-read the entire dataset to get the latest snapshot.

You read about it at below article:

Use Delta Lake change data feed on Azure Databricks - Azure Databricks | Microsoft Learn

Regarding high level design - it would involve:

On Account A (Provider)

-  Create a share and add the tables you want to share

-  Create a recipient (if databricks-to-databricks): either create recipient object or let the recipient request access and you approve. You can set it to a particular workspace or external identity. See docs for exact steps.
- Grant the recipient USE on the share (or accept their request). The recipient will receive access to the live table metadata and data through Delta Sharing.

On Account B (Recipient)

- Connect to the provider share (Catalog Explorer → Delta Sharing → Add provider or accept provider invite). This mounts the provider share as a read-only catalog and you can query as a table.

For low level design just refer to documentation, there's no better source:

Share data using the Delta Sharing Databricks-to-Databricks protocol (for providers) - Azure Databri...

Create and manage data recipients for Delta Sharing (Databricks-to-Databricks sharing) - Azure Datab...

Read data shared using Databricks-to-Databricks Delta Sharing (for recipients) - Azure Databricks | ...

Reference: Solved: Re: Data Transfer using Unity Catalog full impleme... - Databricks Community - 128218