cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Data Lakehouse architecture with Azure Databricks and Unity Catalog

Pratikmsbsvm
Contributor

I am Creating a Data lakehouse solution on Azure Databricks.

Source : SAP, SALESFORCE, Adobe

Target: Hightouch (External Application), Mad Mobile (External Application)

The datalake house also have transactional records which should be store in ACID property storage.

The real challenge is there is 1 more Databricks instance which is on seperate instance.

and that also required data from Data lakehouse.

May someone please help me how architecture should looks like.

Thanks a lot.

1 ACCEPTED SOLUTION

Accepted Solutions

CURIOUS_DE
Contributor III

@Pratikmsbsvm 
You can leverage the below one for your architect solution.

Your Setup at a Glance

Sources

  • SAP, Salesforce, Adobe (Structured & Semi-structured)

Targets

  • Hightouch, Mad Mobile (External downstream apps needing curated data)

Core Requirement

  • Data must be stored in ACID-compliant format โ†’ โœ… Use Delta Lake(Managed will be great/ if there are company constraint external location will work)

Cross-Workspace Data Sharing

  • Another Databricks instance (separate workspace) needs access to this lakehouse data

 

Recommended Architecture (High-Level View)

[ SAP / Salesforce / Adobe ]
โ”‚
โ–ผ
Ingestion Layer (via ADF / Synapse / Partner Connectors / REST API)
โ”‚
โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Azure Data Lake Gen2 โ”‚ (Storage layer - centralized)
โ”‚ + Delta Lake for ACID โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚
โ–ผ
Azure Databricks (Primary Workspace)
โ”œโ”€ Bronze: Raw Data
โ”œโ”€ Silver: Cleaned & Transformed
โ””โ”€ Gold: Aggregated / Business Logic Applied
โ”‚
โ”œโ”€โ”€> Load to Hightouch / Mad Mobile (via REST APIs / Hightouch Sync)
โ””โ”€โ”€> Share curated Delta Tables to Other Databricks Workspace (via Delta Sharing or External Table Mount)

Key Components & Patterns

1. Ingestion Options

  • Use Azure Data Factory or Partner Connectors (like Fivetran- We use it often our project) to ingest data from:

    • SAP โ†’ via OData / RFC connectors

    • Salesforce โ†’ via REST/Bulk API

    • Adobe โ†’ via API or S3 data export

2. Storage & Processing Layer

  • Store all raw and processed data in ADLS Gen2, with Delta Lake format

  • Organize Lakehouse zones:

    • Bronze: Raw ingested files

    • Silver: Cleaned & de-duplicated

    • Gold: Ready for consumption (BI / API sync)

5. Cross-Workspace Databricks Access (This is Your Core Challenge and most important)

Option A: Delta Sharing (Recommended if in different orgs/subscriptions)

  • Securely share Delta tables from one workspace to another without copying data

  • Works across different cloud accounts

Option B: Mount/Use Service Principal ADLS Storage Account (Only if workspaces are under same Azure AD tenant)

  • Mount/use Service Principal same ADLS Gen2 storage in both Databricks workspaces

  • Other workspace can directly access tables if permissions are aligned in Groups (Access via Databricks Account Console)

Option C: Data Replication with Jobs

  • Periodically replicate key Delta tables to the secondary Databricks instance using jobs or autoloader

 

Governance / Security Recommendations

  • Use Unity Catalog (if available) for fine-grained access control

  • Encrypt data at rest (ADLS) and in transit

  • Use service principals or managed identities for secure access between services

 

Summary Visual (Simplified)

 Sources โ†’           Ingestion โ†’    Delta Lakehouse โ†’            Destinations
[SAP, SFDC, Adobe] [ADF, APIs] [Bronze, Silver, Gold] [Hightouch, Mad Mobile, Other DBX]
โ–ฒ
โ”‚
Cross-Workspace Access (Delta Sharing / Mounting / Jobs)

Let me know if this helps ๐Ÿ™‚

 

Databricks Solution Architect

View solution in original post

2 REPLIES 2

CURIOUS_DE
Contributor III

@Pratikmsbsvm 
You can leverage the below one for your architect solution.

Your Setup at a Glance

Sources

  • SAP, Salesforce, Adobe (Structured & Semi-structured)

Targets

  • Hightouch, Mad Mobile (External downstream apps needing curated data)

Core Requirement

  • Data must be stored in ACID-compliant format โ†’ โœ… Use Delta Lake(Managed will be great/ if there are company constraint external location will work)

Cross-Workspace Data Sharing

  • Another Databricks instance (separate workspace) needs access to this lakehouse data

 

Recommended Architecture (High-Level View)

[ SAP / Salesforce / Adobe ]
โ”‚
โ–ผ
Ingestion Layer (via ADF / Synapse / Partner Connectors / REST API)
โ”‚
โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Azure Data Lake Gen2 โ”‚ (Storage layer - centralized)
โ”‚ + Delta Lake for ACID โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚
โ–ผ
Azure Databricks (Primary Workspace)
โ”œโ”€ Bronze: Raw Data
โ”œโ”€ Silver: Cleaned & Transformed
โ””โ”€ Gold: Aggregated / Business Logic Applied
โ”‚
โ”œโ”€โ”€> Load to Hightouch / Mad Mobile (via REST APIs / Hightouch Sync)
โ””โ”€โ”€> Share curated Delta Tables to Other Databricks Workspace (via Delta Sharing or External Table Mount)

Key Components & Patterns

1. Ingestion Options

  • Use Azure Data Factory or Partner Connectors (like Fivetran- We use it often our project) to ingest data from:

    • SAP โ†’ via OData / RFC connectors

    • Salesforce โ†’ via REST/Bulk API

    • Adobe โ†’ via API or S3 data export

2. Storage & Processing Layer

  • Store all raw and processed data in ADLS Gen2, with Delta Lake format

  • Organize Lakehouse zones:

    • Bronze: Raw ingested files

    • Silver: Cleaned & de-duplicated

    • Gold: Ready for consumption (BI / API sync)

5. Cross-Workspace Databricks Access (This is Your Core Challenge and most important)

Option A: Delta Sharing (Recommended if in different orgs/subscriptions)

  • Securely share Delta tables from one workspace to another without copying data

  • Works across different cloud accounts

Option B: Mount/Use Service Principal ADLS Storage Account (Only if workspaces are under same Azure AD tenant)

  • Mount/use Service Principal same ADLS Gen2 storage in both Databricks workspaces

  • Other workspace can directly access tables if permissions are aligned in Groups (Access via Databricks Account Console)

Option C: Data Replication with Jobs

  • Periodically replicate key Delta tables to the secondary Databricks instance using jobs or autoloader

 

Governance / Security Recommendations

  • Use Unity Catalog (if available) for fine-grained access control

  • Encrypt data at rest (ADLS) and in transit

  • Use service principals or managed identities for secure access between services

 

Summary Visual (Simplified)

 Sources โ†’           Ingestion โ†’    Delta Lakehouse โ†’            Destinations
[SAP, SFDC, Adobe] [ADF, APIs] [Bronze, Silver, Gold] [Hightouch, Mad Mobile, Other DBX]
โ–ฒ
โ”‚
Cross-Workspace Access (Delta Sharing / Mounting / Jobs)

Let me know if this helps ๐Ÿ™‚

 

Databricks Solution Architect

KaranamS
Contributor III

Hi @Pratikmsbsvm , from what I understand, you have a lakehouse on Azure databricks and would like to share this data with another databricks account or workspace. If Unity Catalog is enabled on your Azure databricks account, you can leverage Delta Sharing to securely share the data with other databricks accounts. 

https://docs.databricks.com/aws/en/delta-sharing/

Feel free to post if this does not answer your question or you need any specific details regarding this solution

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now