โ07-15-2025 03:37 AM
How to Read and Wrire Data between 2 seperate instance of Databricks.
I want to have bi-directional data read and write between Databricks A and Databricks B. Both are not in same instance.
Please help
yesterday - last edited yesterday
Here are some patterns that you can utilize:
1. If the workspaces are in different Databricks Accounts or different Azure regions. The recommended approach is Delta sharing.
The simplest, most governed way to let A read Bโs tables and B read Aโs tables without copying. Each side remains the writer to its own storage; the other side gets readโonly access via shares (make it biโdirectional by creating reciprocal shares). Works across different accounts, regions, even clouds, as long as both workspaces are Unity Catalog (UC) enabled.
https://docs.databricks.com/aws/en/delta-sharing/share-data-databricks
%sql
--- On Instance A run this code.
-- Create a share and add objects
CREATE SHARE a_to_b_share;
ALTER SHARE a_to_b_share ADD TABLE main.sales.orders;
-- Create a recipient using Bโs UC metastore sharing identifier
CREATE RECIPIENT b_recipient USING ID '<B_METASTORE_SHARING_ID>';
-- Grant access
GRANT SELECT ON SHARE a_to_b_share TO RECIPIENT b_recipient;
%sql
---On instance B run this code.
-- Create a catalog from provider Aโs share
CREATE CATALOG a_shared FROM SHARE provider_a.a_to_b_share;
-- Query it like local data
2. If the workspaces are in same Databricks Account use Shared cloud object storage via Unity Catalog External Locations.Both A and B can read and write to cloud storage (ADLS) behind UC governance using storage credentials and external locations. This is how you do actual crossโworkspace writes. But don't concurrently write to the same Delta path from two different metastores. Databricks explicitly warns that while reading from the same external location from multiple metastores is safe, concurrent writes to the same path can cause consistency issues. If A and B use different UC metastores, partition write ownership by path/table (e.g., A writes /data/A/*, B writes /data/B/*), or consolidate both workspaces under a single UC metastore if you truly need multiโwriter to the same table.
Use a managed identity for your UCโenabled workspace; grant it Storage Blob Data Contributor on your ADLS Gen2 container. Then:
%sql
-- Create a storage credential backed by managed identity
CREATE STORAGE CREDENTIAL mi_cred
WITH AZURE_MANAGED_IDENTITY
COMMENT 'MI for cross-workspace access';
-- Create an external location over your ADLS Gen2 path
CREATE EXTERNAL LOCATION extloc_shared
URL 'abfss://data@myaccount.dfs.core.windows.net/lakehouse/shared'
WITH (STORAGE CREDENTIAL mi_cred);
-- Grant read/write to a UC group used by both workspaces
GRANT READ FILES, WRITE FILES ON EXTERNAL LOCATION extloc_shared TO `data_x_ws_group`;
Now both workspaces can use that path in UCโmanaged tables or external Delta tables, for example:
%sql
-- Create external Delta table on Aโs authored area
CREATE TABLE shared.catalog.schema.orders_a
USING DELTA
LOCATION 'abfss://data@myaccount.dfs.core.windows.net/lakehouse/shared/A/orders';
-- B writes to its own authored area
CREATE TABLE shared.catalog.schema.orders_b
USING DELTA
LOCATION 'abfss://data@myaccount.dfs.core.windows.net/lakehouse/shared/B/orders';
3. You can set up pipelines if you need to materialize copies in both workspaces.Pipelines that replicate data in both directions โ copy data, batch or streaming.
4. Direct connector to another Databricks workspace (JDBC), but not my first choice.You can connect from A to B using the Databricks connector (JDBC) and even create an external table USING databricks that points to B. This works, but Databricks recommends Delta Sharing for most data sharing operations; the JDBC pattern introduces token management and tighter coupling to remote compute/warehouses. Use for ad hoc or transitional needs, not as your longโterm backbone.
https://docs.azure.cn/en-us/databricks/connect/external-systems/databricks
5. Clean Rooms might be another option. I haven't tried this option myself.
https://learn.microsoft.com/en-us/azure/databricks/clean-rooms/
Friday
Hello @Pratikmsbsvm , I want to better understand what you mean by "Instance"? Do you mean two seperate workspace within the same ADB account or do you mean two different ADB accounts? Please clarify so I can provide guidance.
Regards, Louis.
yesterday
Also, are both instances Unity Catalog enabled?
yesterday - last edited yesterday
Here are some patterns that you can utilize:
1. If the workspaces are in different Databricks Accounts or different Azure regions. The recommended approach is Delta sharing.
The simplest, most governed way to let A read Bโs tables and B read Aโs tables without copying. Each side remains the writer to its own storage; the other side gets readโonly access via shares (make it biโdirectional by creating reciprocal shares). Works across different accounts, regions, even clouds, as long as both workspaces are Unity Catalog (UC) enabled.
https://docs.databricks.com/aws/en/delta-sharing/share-data-databricks
%sql
--- On Instance A run this code.
-- Create a share and add objects
CREATE SHARE a_to_b_share;
ALTER SHARE a_to_b_share ADD TABLE main.sales.orders;
-- Create a recipient using Bโs UC metastore sharing identifier
CREATE RECIPIENT b_recipient USING ID '<B_METASTORE_SHARING_ID>';
-- Grant access
GRANT SELECT ON SHARE a_to_b_share TO RECIPIENT b_recipient;
%sql
---On instance B run this code.
-- Create a catalog from provider Aโs share
CREATE CATALOG a_shared FROM SHARE provider_a.a_to_b_share;
-- Query it like local data
2. If the workspaces are in same Databricks Account use Shared cloud object storage via Unity Catalog External Locations.Both A and B can read and write to cloud storage (ADLS) behind UC governance using storage credentials and external locations. This is how you do actual crossโworkspace writes. But don't concurrently write to the same Delta path from two different metastores. Databricks explicitly warns that while reading from the same external location from multiple metastores is safe, concurrent writes to the same path can cause consistency issues. If A and B use different UC metastores, partition write ownership by path/table (e.g., A writes /data/A/*, B writes /data/B/*), or consolidate both workspaces under a single UC metastore if you truly need multiโwriter to the same table.
Use a managed identity for your UCโenabled workspace; grant it Storage Blob Data Contributor on your ADLS Gen2 container. Then:
%sql
-- Create a storage credential backed by managed identity
CREATE STORAGE CREDENTIAL mi_cred
WITH AZURE_MANAGED_IDENTITY
COMMENT 'MI for cross-workspace access';
-- Create an external location over your ADLS Gen2 path
CREATE EXTERNAL LOCATION extloc_shared
URL 'abfss://data@myaccount.dfs.core.windows.net/lakehouse/shared'
WITH (STORAGE CREDENTIAL mi_cred);
-- Grant read/write to a UC group used by both workspaces
GRANT READ FILES, WRITE FILES ON EXTERNAL LOCATION extloc_shared TO `data_x_ws_group`;
Now both workspaces can use that path in UCโmanaged tables or external Delta tables, for example:
%sql
-- Create external Delta table on Aโs authored area
CREATE TABLE shared.catalog.schema.orders_a
USING DELTA
LOCATION 'abfss://data@myaccount.dfs.core.windows.net/lakehouse/shared/A/orders';
-- B writes to its own authored area
CREATE TABLE shared.catalog.schema.orders_b
USING DELTA
LOCATION 'abfss://data@myaccount.dfs.core.windows.net/lakehouse/shared/B/orders';
3. You can set up pipelines if you need to materialize copies in both workspaces.Pipelines that replicate data in both directions โ copy data, batch or streaming.
4. Direct connector to another Databricks workspace (JDBC), but not my first choice.You can connect from A to B using the Databricks connector (JDBC) and even create an external table USING databricks that points to B. This works, but Databricks recommends Delta Sharing for most data sharing operations; the JDBC pattern introduces token management and tighter coupling to remote compute/warehouses. Use for ad hoc or transitional needs, not as your longโterm backbone.
https://docs.azure.cn/en-us/databricks/connect/external-systems/databricks
5. Clean Rooms might be another option. I haven't tried this option myself.
https://learn.microsoft.com/en-us/azure/databricks/clean-rooms/
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now