Databricks Community

Pratikmsbsvm · ‎07-15-2025

How to Read and Wrire Data between 2 seperate instance of Databricks.

I want to have bi-directional data read and write between Databricks A and Databricks B. Both are not in same instance.

Please help

nayan_wylde · ‎10-06-2025

Here are some patterns that you can utilize:

1. If the workspaces are in different Databricks Accounts or different Azure regions. The recommended approach is Delta sharing.
The simplest, most governed way to let A read B’s tables and B read A’s tables without copying. Each side remains the writer to its own storage; the other side gets read‑only access via shares (make it bi‑directional by creating reciprocal shares). Works across different accounts, regions, even clouds, as long as both workspaces are Unity Catalog (UC) enabled.

https://docs.databricks.com/aws/en/delta-sharing/share-data-databricks

%sql

--- On Instance A run this code.

-- Create a share and add objects
CREATE SHARE a_to_b_share;
ALTER SHARE a_to_b_share ADD TABLE main.sales.orders;

-- Create a recipient using B’s UC metastore sharing identifier
CREATE RECIPIENT b_recipient USING ID '<B_METASTORE_SHARING_ID>';

-- Grant access
GRANT SELECT ON SHARE a_to_b_share TO RECIPIENT b_recipient;

%sql

---On instance B run this code.

-- Create a catalog from provider A’s share
CREATE CATALOG a_shared FROM SHARE provider_a.a_to_b_share;

-- Query it like local data

2. If the workspaces are in same Databricks Account use Shared cloud object storage via Unity Catalog External Locations.Both A and B can read and write to cloud storage (ADLS) behind UC governance using storage credentials and external locations. This is how you do actual cross‑workspace writes. But don't concurrently write to the same Delta path from two different metastores. Databricks explicitly warns that while reading from the same external location from multiple metastores is safe, concurrent writes to the same path can cause consistency issues. If A and B use different UC metastores, partition write ownership by path/table (e.g., A writes /data/A/*, B writes /data/B/*), or consolidate both workspaces under a single UC metastore if you truly need multi‑writer to the same table.

Use a managed identity for your UC‑enabled workspace; grant it Storage Blob Data Contributor on your ADLS Gen2 container. Then:

%sql
-- Create a storage credential backed by managed identity
CREATE STORAGE CREDENTIAL mi_cred
  WITH AZURE_MANAGED_IDENTITY
  COMMENT 'MI for cross-workspace access';

-- Create an external location over your ADLS Gen2 path
CREATE EXTERNAL LOCATION extloc_shared
  URL 'abfss://data@myaccount.dfs.core.windows.net/lakehouse/shared'
  WITH (STORAGE CREDENTIAL mi_cred);

-- Grant read/write to a UC group used by both workspaces
GRANT READ FILES, WRITE FILES ON EXTERNAL LOCATION extloc_shared TO `data_x_ws_group`;

Now both workspaces can use that path in UC‑managed tables or external Delta tables, for example:

%sql
-- Create external Delta table on A’s authored area
CREATE TABLE shared.catalog.schema.orders_a
USING DELTA
LOCATION 'abfss://data@myaccount.dfs.core.windows.net/lakehouse/shared/A/orders';

-- B writes to its own authored area
CREATE TABLE shared.catalog.schema.orders_b
USING DELTA
LOCATION 'abfss://data@myaccount.dfs.core.windows.net/lakehouse/shared/B/orders';

3. You can set up pipelines if you need to materialize copies in both workspaces.Pipelines that replicate data in both directions — copy data, batch or streaming.

4. Direct connector to another Databricks workspace (JDBC), but not my first choice.You can connect from A to B using the Databricks connector (JDBC) and even create an external table USING databricks that points to B. This works, but Databricks recommends Delta Sharing for most data sharing operations; the JDBC pattern introduces token management and tighter coupling to remote compute/warehouses. Use for ad hoc or transitional needs, not as your long‑term backbone.
https://docs.azure.cn/en-us/databricks/connect/external-systems/databricks

5. Clean Rooms might be another option. I haven't tried this option myself.
https://learn.microsoft.com/en-us/azure/databricks/clean-rooms/

View solution in original post

Louis_Frolio · ‎10-03-2025

Hello @Pratikmsbsvm , I want to better understand what you mean by "Instance"? Do you mean two seperate workspace within the same ADB account or do you mean two different ADB accounts? Please clarify so I can provide guidance.

Regards, Louis.

Louis_Frolio · ‎10-06-2025

Also, are both instances Unity Catalog enabled?

nayan_wylde · ‎10-06-2025

Here are some patterns that you can utilize:

1. If the workspaces are in different Databricks Accounts or different Azure regions. The recommended approach is Delta sharing.
The simplest, most governed way to let A read B’s tables and B read A’s tables without copying. Each side remains the writer to its own storage; the other side gets read‑only access via shares (make it bi‑directional by creating reciprocal shares). Works across different accounts, regions, even clouds, as long as both workspaces are Unity Catalog (UC) enabled.

https://docs.databricks.com/aws/en/delta-sharing/share-data-databricks

%sql

--- On Instance A run this code.

-- Create a share and add objects
CREATE SHARE a_to_b_share;
ALTER SHARE a_to_b_share ADD TABLE main.sales.orders;

-- Create a recipient using B’s UC metastore sharing identifier
CREATE RECIPIENT b_recipient USING ID '<B_METASTORE_SHARING_ID>';

-- Grant access
GRANT SELECT ON SHARE a_to_b_share TO RECIPIENT b_recipient;

%sql

---On instance B run this code.

-- Create a catalog from provider A’s share
CREATE CATALOG a_shared FROM SHARE provider_a.a_to_b_share;

-- Query it like local data

2. If the workspaces are in same Databricks Account use Shared cloud object storage via Unity Catalog External Locations.Both A and B can read and write to cloud storage (ADLS) behind UC governance using storage credentials and external locations. This is how you do actual cross‑workspace writes. But don't concurrently write to the same Delta path from two different metastores. Databricks explicitly warns that while reading from the same external location from multiple metastores is safe, concurrent writes to the same path can cause consistency issues. If A and B use different UC metastores, partition write ownership by path/table (e.g., A writes /data/A/*, B writes /data/B/*), or consolidate both workspaces under a single UC metastore if you truly need multi‑writer to the same table.

Use a managed identity for your UC‑enabled workspace; grant it Storage Blob Data Contributor on your ADLS Gen2 container. Then:

%sql
-- Create a storage credential backed by managed identity
CREATE STORAGE CREDENTIAL mi_cred
  WITH AZURE_MANAGED_IDENTITY
  COMMENT 'MI for cross-workspace access';

-- Create an external location over your ADLS Gen2 path
CREATE EXTERNAL LOCATION extloc_shared
  URL 'abfss://data@myaccount.dfs.core.windows.net/lakehouse/shared'
  WITH (STORAGE CREDENTIAL mi_cred);

-- Grant read/write to a UC group used by both workspaces
GRANT READ FILES, WRITE FILES ON EXTERNAL LOCATION extloc_shared TO `data_x_ws_group`;

Now both workspaces can use that path in UC‑managed tables or external Delta tables, for example:

%sql
-- Create external Delta table on A’s authored area
CREATE TABLE shared.catalog.schema.orders_a
USING DELTA
LOCATION 'abfss://data@myaccount.dfs.core.windows.net/lakehouse/shared/A/orders';

-- B writes to its own authored area
CREATE TABLE shared.catalog.schema.orders_b
USING DELTA
LOCATION 'abfss://data@myaccount.dfs.core.windows.net/lakehouse/shared/B/orders';

3. You can set up pipelines if you need to materialize copies in both workspaces.Pipelines that replicate data in both directions — copy data, batch or streaming.

4. Direct connector to another Databricks workspace (JDBC), but not my first choice.You can connect from A to B using the Databricks connector (JDBC) and even create an external table USING databricks that points to B. This works, but Databricks recommends Delta Sharing for most data sharing operations; the JDBC pattern introduces token management and tighter coupling to remote compute/warehouses. Use for ad hoc or transitional needs, not as your long‑term backbone.
https://docs.azure.cn/en-us/databricks/connect/external-systems/databricks

5. Clean Rooms might be another option. I haven't tried this option myself.
https://learn.microsoft.com/en-us/azure/databricks/clean-rooms/

Databricks Community

How to Read and Wrire Data between 2 seperate instance of Databricks

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! November 28 – December 04, 2025

Lakehouse, Lagers & Legends — Bangalore Meetup | December 13

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐