cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to Read and Wrire Data between 2 seperate instance of Databricks

Pratikmsbsvm
Contributor

How to Read and Wrire Data between 2 seperate instance of Databricks.

I want to have bi-directional data read and write between Databricks A and Databricks B. Both are not in same instance.

Pratikmsbsvm_0-1752575827266.png

Please help

1 ACCEPTED SOLUTION

Accepted Solutions

nayan_wylde
Honored Contributor III

Here are some patterns that you can utilize:

1. If the workspaces are in different Databricks Accounts or different Azure regions. The recommended approach is Delta sharing.
The simplest, most governed way to let A read Bโ€™s tables and B read Aโ€™s tables without copying. Each side remains the writer to its own storage; the other side gets readโ€‘only access via shares (make it biโ€‘directional by creating reciprocal shares). Works across different accounts, regions, even clouds, as long as both workspaces are Unity Catalog (UC) enabled.

https://docs.databricks.com/aws/en/delta-sharing/share-data-databricks

%sql

--- On Instance A run this code.

-- Create a share and add objects
CREATE SHARE a_to_b_share;
ALTER SHARE a_to_b_share ADD TABLE main.sales.orders;

-- Create a recipient using Bโ€™s UC metastore sharing identifier
CREATE RECIPIENT b_recipient USING ID '<B_METASTORE_SHARING_ID>';

-- Grant access
GRANT SELECT ON SHARE a_to_b_share TO RECIPIENT b_recipient;
%sql

---On instance B run this code.

-- Create a catalog from provider Aโ€™s share
CREATE CATALOG a_shared FROM SHARE provider_a.a_to_b_share;

-- Query it like local data

2. If the workspaces are in same Databricks Account use Shared cloud object storage via Unity Catalog External Locations.Both A and B can read and write to cloud storage (ADLS) behind UC governance using storage credentials and external locations. This is how you do actual crossโ€‘workspace writes. But don't concurrently write to the same Delta path from two different metastores. Databricks explicitly warns that while reading from the same external location from multiple metastores is safe, concurrent writes to the same path can cause consistency issues. If A and B use different UC metastores, partition write ownership by path/table (e.g., A writes /data/A/*, B writes /data/B/*), or consolidate both workspaces under a single UC metastore if you truly need multiโ€‘writer to the same table.

Use a managed identity for your UCโ€‘enabled workspace; grant it Storage Blob Data Contributor on your ADLS Gen2 container. Then:

%sql
-- Create a storage credential backed by managed identity
CREATE STORAGE CREDENTIAL mi_cred
  WITH AZURE_MANAGED_IDENTITY
  COMMENT 'MI for cross-workspace access';

-- Create an external location over your ADLS Gen2 path
CREATE EXTERNAL LOCATION extloc_shared
  URL 'abfss://data@myaccount.dfs.core.windows.net/lakehouse/shared'
  WITH (STORAGE CREDENTIAL mi_cred);

-- Grant read/write to a UC group used by both workspaces
GRANT READ FILES, WRITE FILES ON EXTERNAL LOCATION extloc_shared TO `data_x_ws_group`;

Now both workspaces can use that path in UCโ€‘managed tables or external Delta tables, for example:

%sql
-- Create external Delta table on Aโ€™s authored area
CREATE TABLE shared.catalog.schema.orders_a
USING DELTA
LOCATION 'abfss://data@myaccount.dfs.core.windows.net/lakehouse/shared/A/orders';

-- B writes to its own authored area
CREATE TABLE shared.catalog.schema.orders_b
USING DELTA
LOCATION 'abfss://data@myaccount.dfs.core.windows.net/lakehouse/shared/B/orders';

3. You can set up pipelines if you need to materialize copies in both workspaces.Pipelines that replicate data in both directions โ€” copy data, batch or streaming.

4. Direct connector to another Databricks workspace (JDBC), but not my first choice.You can connect from A to B using the Databricks connector (JDBC) and even create an external table USING databricks that points to B. This works, but Databricks recommends Delta Sharing for most data sharing operations; the JDBC pattern introduces token management and tighter coupling to remote compute/warehouses. Use for ad hoc or transitional needs, not as your longโ€‘term backbone.
https://docs.azure.cn/en-us/databricks/connect/external-systems/databricks

5. Clean Rooms might be another option. I haven't tried this option myself.
https://learn.microsoft.com/en-us/azure/databricks/clean-rooms/

 

 

View solution in original post

3 REPLIES 3

Louis_Frolio
Databricks Employee
Databricks Employee

Hello @Pratikmsbsvm , I want to better understand what you mean by "Instance"?  Do you mean two seperate workspace within the same ADB account or do you mean two different ADB accounts?  Please clarify so I can provide guidance.

Regards, Louis.

Louis_Frolio
Databricks Employee
Databricks Employee

Also, are both instances Unity Catalog enabled?

 

nayan_wylde
Honored Contributor III

Here are some patterns that you can utilize:

1. If the workspaces are in different Databricks Accounts or different Azure regions. The recommended approach is Delta sharing.
The simplest, most governed way to let A read Bโ€™s tables and B read Aโ€™s tables without copying. Each side remains the writer to its own storage; the other side gets readโ€‘only access via shares (make it biโ€‘directional by creating reciprocal shares). Works across different accounts, regions, even clouds, as long as both workspaces are Unity Catalog (UC) enabled.

https://docs.databricks.com/aws/en/delta-sharing/share-data-databricks

%sql

--- On Instance A run this code.

-- Create a share and add objects
CREATE SHARE a_to_b_share;
ALTER SHARE a_to_b_share ADD TABLE main.sales.orders;

-- Create a recipient using Bโ€™s UC metastore sharing identifier
CREATE RECIPIENT b_recipient USING ID '<B_METASTORE_SHARING_ID>';

-- Grant access
GRANT SELECT ON SHARE a_to_b_share TO RECIPIENT b_recipient;
%sql

---On instance B run this code.

-- Create a catalog from provider Aโ€™s share
CREATE CATALOG a_shared FROM SHARE provider_a.a_to_b_share;

-- Query it like local data

2. If the workspaces are in same Databricks Account use Shared cloud object storage via Unity Catalog External Locations.Both A and B can read and write to cloud storage (ADLS) behind UC governance using storage credentials and external locations. This is how you do actual crossโ€‘workspace writes. But don't concurrently write to the same Delta path from two different metastores. Databricks explicitly warns that while reading from the same external location from multiple metastores is safe, concurrent writes to the same path can cause consistency issues. If A and B use different UC metastores, partition write ownership by path/table (e.g., A writes /data/A/*, B writes /data/B/*), or consolidate both workspaces under a single UC metastore if you truly need multiโ€‘writer to the same table.

Use a managed identity for your UCโ€‘enabled workspace; grant it Storage Blob Data Contributor on your ADLS Gen2 container. Then:

%sql
-- Create a storage credential backed by managed identity
CREATE STORAGE CREDENTIAL mi_cred
  WITH AZURE_MANAGED_IDENTITY
  COMMENT 'MI for cross-workspace access';

-- Create an external location over your ADLS Gen2 path
CREATE EXTERNAL LOCATION extloc_shared
  URL 'abfss://data@myaccount.dfs.core.windows.net/lakehouse/shared'
  WITH (STORAGE CREDENTIAL mi_cred);

-- Grant read/write to a UC group used by both workspaces
GRANT READ FILES, WRITE FILES ON EXTERNAL LOCATION extloc_shared TO `data_x_ws_group`;

Now both workspaces can use that path in UCโ€‘managed tables or external Delta tables, for example:

%sql
-- Create external Delta table on Aโ€™s authored area
CREATE TABLE shared.catalog.schema.orders_a
USING DELTA
LOCATION 'abfss://data@myaccount.dfs.core.windows.net/lakehouse/shared/A/orders';

-- B writes to its own authored area
CREATE TABLE shared.catalog.schema.orders_b
USING DELTA
LOCATION 'abfss://data@myaccount.dfs.core.windows.net/lakehouse/shared/B/orders';

3. You can set up pipelines if you need to materialize copies in both workspaces.Pipelines that replicate data in both directions โ€” copy data, batch or streaming.

4. Direct connector to another Databricks workspace (JDBC), but not my first choice.You can connect from A to B using the Databricks connector (JDBC) and even create an external table USING databricks that points to B. This works, but Databricks recommends Delta Sharing for most data sharing operations; the JDBC pattern introduces token management and tighter coupling to remote compute/warehouses. Use for ad hoc or transitional needs, not as your longโ€‘term backbone.
https://docs.azure.cn/en-us/databricks/connect/external-systems/databricks

5. Clean Rooms might be another option. I haven't tried this option myself.
https://learn.microsoft.com/en-us/azure/databricks/clean-rooms/

 

 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now