<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Data Lakehouse architecture with Azure Databricks and Unity Catalog in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/data-lakehouse-architecture-with-azure-databricks-and-unity/m-p/122877#M46894</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/143693"&gt;@Pratikmsbsvm&lt;/a&gt;&amp;nbsp;, from what I understand, you have a lakehouse on Azure databricks and would like to share this data with another databricks account or workspace.&amp;nbsp;If Unity Catalog is enabled on your Azure databricks account, you can leverage Delta Sharing to securely share the data with other databricks accounts.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/aws/en/delta-sharing/" target="_blank"&gt;https://docs.databricks.com/aws/en/delta-sharing/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Feel free to post if this does not answer your question or you need any specific details regarding this solution&lt;/P&gt;</description>
    <pubDate>Wed, 25 Jun 2025 18:44:08 GMT</pubDate>
    <dc:creator>KaranamS</dc:creator>
    <dc:date>2025-06-25T18:44:08Z</dc:date>
    <item>
      <title>Data Lakehouse architecture with Azure Databricks and Unity Catalog</title>
      <link>https://community.databricks.com/t5/data-engineering/data-lakehouse-architecture-with-azure-databricks-and-unity/m-p/122858#M46885</link>
      <description>&lt;P&gt;I am Creating a Data lakehouse solution on Azure Databricks.&lt;/P&gt;&lt;P&gt;Source : SAP, SALESFORCE, Adobe&lt;/P&gt;&lt;P&gt;Target: Hightouch (External Application), Mad Mobile (External Application)&lt;/P&gt;&lt;P&gt;The datalake house also have transactional records which should be store in ACID property storage.&lt;/P&gt;&lt;P&gt;The real challenge is there is 1 more Databricks instance which is on seperate instance.&lt;/P&gt;&lt;P&gt;and that also required data from&amp;nbsp;Data lakehouse.&lt;/P&gt;&lt;P&gt;May someone please help me how architecture should looks like.&lt;/P&gt;&lt;P&gt;Thanks a lot.&lt;/P&gt;</description>
      <pubDate>Wed, 25 Jun 2025 16:08:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/data-lakehouse-architecture-with-azure-databricks-and-unity/m-p/122858#M46885</guid>
      <dc:creator>Pratikmsbsvm</dc:creator>
      <dc:date>2025-06-25T16:08:02Z</dc:date>
    </item>
    <item>
      <title>Re: Data Lakehouse architecture with Azure Databricks and Unity Catalog</title>
      <link>https://community.databricks.com/t5/data-engineering/data-lakehouse-architecture-with-azure-databricks-and-unity/m-p/122871#M46892</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/143693"&gt;@Pratikmsbsvm&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;You can leverage the below one for your architect solution.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Your Setup at a Glance&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Sources&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;SAP, Salesforce, Adobe&lt;/STRONG&gt; (Structured &amp;amp; Semi-structured)&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Targets&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Hightouch, Mad Mobile&lt;/STRONG&gt; (External downstream apps needing curated data)&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Core Requirement&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Data must be stored in &lt;STRONG&gt;ACID-compliant&lt;/STRONG&gt; format → &lt;span class="lia-unicode-emoji" title=":white_heavy_check_mark:"&gt;✅&lt;/span&gt; Use &lt;STRONG&gt;Delta Lake&lt;/STRONG&gt;(Managed will be great/ if there are company constraint external location will work)&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Cross-Workspace Data Sharing&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Another &lt;STRONG&gt;Databricks instance (separate workspace)&lt;/STRONG&gt; needs access to this lakehouse data&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;H3&gt;&lt;SPAN&gt;Recommended Architecture (High-Level View)&lt;/SPAN&gt;&lt;/H3&gt;&lt;P class="lia-align-center"&gt;&lt;SPAN&gt;[ SAP / Salesforce / Adobe ]&lt;BR /&gt;│&lt;BR /&gt;▼&lt;BR /&gt;Ingestion Layer (via ADF / Synapse / Partner Connectors / REST API)&lt;BR /&gt;│&lt;BR /&gt;▼&lt;BR /&gt;┌───────────────────────────┐&lt;BR /&gt;│ Azure Data Lake Gen2 │ (&lt;STRONG&gt;Storage&lt;/STRONG&gt; layer - centralized)&lt;BR /&gt;│ + Delta Lake for ACID │&lt;BR /&gt;└───────────────────────────┘&lt;BR /&gt;│&lt;BR /&gt;▼&lt;BR /&gt;Azure Databricks (&lt;STRONG&gt;Primary&lt;/STRONG&gt; Workspace)&lt;BR /&gt;├─ Bronze: Raw Data&lt;BR /&gt;├─ Silver: Cleaned &amp;amp; Transformed&lt;BR /&gt;└─ Gold: Aggregated / Business Logic Applied&lt;BR /&gt;│&lt;BR /&gt;├──&amp;gt; &lt;STRONG&gt;Load to&lt;/STRONG&gt; Hightouch / Mad Mobile (via REST APIs / Hightouch Sync)&lt;BR /&gt;└──&amp;gt; Share curated Delta Tables to Other Databricks Workspace (via Delta Sharing or External Table Mount)&lt;/SPAN&gt;&lt;/P&gt;&lt;H2&gt;Key Components &amp;amp; Patterns&lt;/H2&gt;&lt;H3&gt;1. &lt;STRONG&gt;Ingestion Options&lt;/STRONG&gt;&lt;/H3&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Use &lt;STRONG&gt;Azure Data Factory&lt;/STRONG&gt; or &lt;STRONG&gt;Partner Connectors&lt;/STRONG&gt; (like Fivetran- We use it often our project) to ingest data from:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;SAP&lt;/STRONG&gt; → via OData / RFC connectors&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Salesforce&lt;/STRONG&gt; → via REST/Bulk API&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Adobe&lt;/STRONG&gt; → via API or S3 data export&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;H3&gt;2. &lt;STRONG&gt;Storage &amp;amp; Processing Layer&lt;/STRONG&gt;&lt;/H3&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Store all raw and processed data in &lt;STRONG&gt;ADLS Gen2&lt;/STRONG&gt;, with &lt;STRONG&gt;Delta Lake format&lt;/STRONG&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Organize Lakehouse zones:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Bronze&lt;/STRONG&gt;: Raw ingested files&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Silver&lt;/STRONG&gt;: Cleaned &amp;amp; de-duplicated&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Gold&lt;/STRONG&gt;: Ready for consumption (BI / API sync)&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;H3&gt;5. &lt;STRONG&gt;Cross-Workspace Databricks Access (This is Your Core Challenge and most important)&lt;/STRONG&gt;&lt;/H3&gt;&lt;H4&gt;Option A: &lt;STRONG&gt;Delta Sharing&lt;/STRONG&gt; (Recommended if in different orgs/subscriptions)&lt;/H4&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Securely share Delta tables&lt;/STRONG&gt; from one workspace to another without copying data&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Works across different cloud accounts&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;H4&gt;Option B: &lt;STRONG&gt;Mount/Use Service Principal ADLS Storage Account&lt;/STRONG&gt; (Only if workspaces are under same Azure AD tenant)&lt;/H4&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Mount/use Service Principal same &lt;STRONG&gt;ADLS Gen2&lt;/STRONG&gt; storage in both Databricks workspaces&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Other workspace can directly access tables if permissions are aligned in Groups (Access via Databricks Account Console)&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;H4&gt;Option C: &lt;STRONG&gt;Data Replication with Jobs&lt;/STRONG&gt;&lt;/H4&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Periodically &lt;STRONG&gt;replicate key Delta tables&lt;/STRONG&gt; to the secondary Databricks instance using jobs or autoloader&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;H2&gt;Governance / Security Recommendations&lt;/H2&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Use &lt;STRONG&gt;Unity Catalog&lt;/STRONG&gt; (if available) for fine-grained access control&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Encrypt data at rest (ADLS) and in transit&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Use &lt;STRONG&gt;service principals&lt;/STRONG&gt; or &lt;STRONG&gt;managed identities&lt;/STRONG&gt; for secure access between services&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;H3&gt;&lt;SPAN&gt;Summary Visual (Simplified)&lt;/SPAN&gt;&lt;/H3&gt;&lt;PRE&gt;&lt;SPAN&gt; Sources →           Ingestion →    Delta Lakehouse →            Destinations&lt;BR /&gt;[SAP, SFDC, Adobe]   [ADF, APIs]    [Bronze, Silver, Gold]      [Hightouch, Mad Mobile, Other DBX]&lt;BR /&gt;                                      ▲&lt;BR /&gt;                                      │&lt;BR /&gt;                                  Cross-Workspace Access (Delta Sharing / Mounting / Jobs)&lt;BR /&gt;&lt;/SPAN&gt;&lt;/PRE&gt;&lt;P&gt;Let me know if this helps &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 25 Jun 2025 17:59:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/data-lakehouse-architecture-with-azure-databricks-and-unity/m-p/122871#M46892</guid>
      <dc:creator>CURIOUS_DE</dc:creator>
      <dc:date>2025-06-25T17:59:25Z</dc:date>
    </item>
    <item>
      <title>Re: Data Lakehouse architecture with Azure Databricks and Unity Catalog</title>
      <link>https://community.databricks.com/t5/data-engineering/data-lakehouse-architecture-with-azure-databricks-and-unity/m-p/122877#M46894</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/143693"&gt;@Pratikmsbsvm&lt;/a&gt;&amp;nbsp;, from what I understand, you have a lakehouse on Azure databricks and would like to share this data with another databricks account or workspace.&amp;nbsp;If Unity Catalog is enabled on your Azure databricks account, you can leverage Delta Sharing to securely share the data with other databricks accounts.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/aws/en/delta-sharing/" target="_blank"&gt;https://docs.databricks.com/aws/en/delta-sharing/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Feel free to post if this does not answer your question or you need any specific details regarding this solution&lt;/P&gt;</description>
      <pubDate>Wed, 25 Jun 2025 18:44:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/data-lakehouse-architecture-with-azure-databricks-and-unity/m-p/122877#M46894</guid>
      <dc:creator>KaranamS</dc:creator>
      <dc:date>2025-06-25T18:44:08Z</dc:date>
    </item>
  </channel>
</rss>

