<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Data Transfer using Unity Catalog full implementation in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/data-transfer-using-unity-catalog-full-implementation/m-p/128314#M48206</link>
    <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/24053"&gt;@lingareddy_Alva&lt;/a&gt;&amp;nbsp;&amp;nbsp; :Thanks a lot . Will this solution also work when&amp;nbsp;&lt;STRONG&gt;Provider &lt;/STRONG&gt;and&amp;nbsp;&lt;STRONG&gt;Recipient &lt;/STRONG&gt;have different accounnt.&lt;/P&gt;&lt;P&gt;Kindly suggest.&lt;/P&gt;</description>
    <pubDate>Wed, 13 Aug 2025 09:29:35 GMT</pubDate>
    <dc:creator>Datalight</dc:creator>
    <dc:date>2025-08-13T09:29:35Z</dc:date>
    <item>
      <title>Data Transfer using Unity Catalog full implementation</title>
      <link>https://community.databricks.com/t5/data-engineering/data-transfer-using-unity-catalog-full-implementation/m-p/128218#M48182</link>
      <description>&lt;DIV&gt;I have to share data between Azure A&amp;nbsp; &amp;nbsp;and Azure B . using unity catalog and delta sharing.&lt;/DIV&gt;&lt;DIV&gt;Every Time Data comes to Azure A, The same Data can be read by AzureB.&lt;/DIV&gt;&lt;DIV&gt;How to handle Incremental Load. for change records I think I need to use Merge Statement.&lt;/DIV&gt;&lt;DIV&gt;May someone please help me with detailed steps&amp;nbsp; on how to implement it in production.&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Kindly share your Knowledge.&lt;/DIV&gt;&lt;DIV&gt;Many Thanks&lt;/DIV&gt;</description>
      <pubDate>Tue, 12 Aug 2025 12:56:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/data-transfer-using-unity-catalog-full-implementation/m-p/128218#M48182</guid>
      <dc:creator>Datalight</dc:creator>
      <dc:date>2025-08-12T12:56:32Z</dc:date>
    </item>
    <item>
      <title>Re: Data Transfer using Unity Catalog full implementation</title>
      <link>https://community.databricks.com/t5/data-engineering/data-transfer-using-unity-catalog-full-implementation/m-p/128273#M48196</link>
      <description>&lt;P&gt;HI&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/179126"&gt;@Datalight&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is a common production pattern. Below I’ll give you a clear, practical end-to-end plan&lt;BR /&gt;(architecture + production best practices) for sharing live Delta tables from Azure A (provider) to Azure B&lt;BR /&gt;(recipient) using Unity Catalog + Delta Sharing, and how to keep Azure B’s copy up to date&lt;BR /&gt;incrementally (MERGE, Change Data Feed, etc.).&lt;/P&gt;&lt;P&gt;I’ll first outline the architecture and prerequisites, then a step-by-step implementation,&lt;BR /&gt;followed by production hardening and monitoring.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;STRONG&gt;1) Architecture &amp;amp; design choices (high level)&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;Provider (Azure A)&lt;/STRONG&gt;: owns the source Delta tables (canonical data) in Unity Catalog.&lt;BR /&gt;Creates a Share that exposes selected tables/views to recipients via Delta Sharing. Provider remains authoritative.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Recipient (Azure B)&lt;/STRONG&gt;: consumes the shared tables. Recipients can read the provider’s tables live (read-only),&lt;BR /&gt;or copy into local tables and apply incremental updates (MERGE/CDC) to maintain a local, query-optimized store.&lt;BR /&gt;Databricks-to-Databricks sharing with Unity Catalog lets a recipient on a Unity Catalog-enabled workspace read&lt;BR /&gt;the provider’s shared data without copying underlying files.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Incremental strategy options&lt;/STRONG&gt; (choose one based on use case):&lt;BR /&gt;&lt;STRONG&gt;1. Direct read (no copy) — B&lt;/STRONG&gt; queries provider table directly. No incremental job required; always reads latest. Good if B just needs reads and latency is acceptable.&lt;BR /&gt;&lt;STRONG&gt;2. Pull + MERGE (recommended if B needs local table / joins / performance)&lt;/STRONG&gt; — B periodically reads new/changed rows from provider and MERGE INTO into local Delta tables. Use a robust CDC mechanism for the diff: Delta Change Data Feed (CDF) or provider-provided staging delta of changes.&lt;BR /&gt;&lt;STRONG&gt;3. Provider pushes updates&lt;/STRONG&gt; — Provider computes and places delta/staging table in share; recipient just applies. (Similar to 2, but pushes responsibility to Provider.)&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;2) Prerequisites &amp;amp; permissions&lt;/STRONG&gt;&lt;BR /&gt;Both workspaces must be Unity Catalog enabled (metastore). Provider workspace must have Unity Catalog and be able to create shares. Recipient should be a Unity Catalog-enabled workspace (or an external recipient if needed).&lt;BR /&gt;Provider: CREATE SHARE privilege or metastore admin. Recipient: USE SHARE granted.&lt;BR /&gt;Networking: ensure any private endpoints / VNet peering and storage firewall rules allow the Databricks managed sharing endpoints (if you use private networking).&lt;BR /&gt;Decide on authentication method for the recipient: Databricks-to-Databricks Delta Sharing uses Unity Catalog-based recipients (no separate storage credentials required).&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;3) Step-by-step implementation (Provider = Azure A, Recipient = Azure B)&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;On Azure A (Provider)&lt;/STRONG&gt;&lt;BR /&gt;- Enable Unity Catalog (if not already). Create/assign metastore and attach workspace.&lt;BR /&gt;- Prepare canonical Delta table(s) in a UC catalog + schema&lt;BR /&gt;- Create a share and add the tables you want to share&lt;BR /&gt;- Create a recipient (if databricks-to-databricks): either create recipient object or let the recipient request access and you approve. You can set it to a particular workspace or external identity. See docs for exact steps.&lt;BR /&gt;- Grant the recipient USE on the share (or accept their request). The recipient will receive access to the live table metadata and data through Delta Sharing.&lt;BR /&gt;&lt;STRONG&gt;On Azure B (Recipient)&lt;/STRONG&gt;&lt;BR /&gt;- Connect to the provider share (Catalog Explorer → Delta Sharing → Add provider or accept provider invite). This mounts the provider share as a read-only catalog and you can query as a table.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;4) Implementing Incremental Load (recommended: CDF + MERGE)&lt;/STRONG&gt;&lt;BR /&gt;I recommend Change Data Feed (CDF) on the provider table + scheduled job on recipient that reads&lt;BR /&gt;the CDF since a version/time and then MERGE into the local table. This gives accurate row-level inserts/updates/deletes.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Typical flow&lt;/STRONG&gt;&lt;BR /&gt;- Provider enables CDF on the canonical table. (done on Azure A.)&lt;BR /&gt;- Recipient job keeps track of last_processed_version or last_processed_timestamp for each shared table.&lt;BR /&gt;- On each run the recipient reads changes from the provider share using table_changes(table =&amp;gt; 'provider_catalog.schema.table',&lt;BR /&gt;start_version=X) or table_changes(table =&amp;gt; ..., start_timestamp=...) (Databricks CDF APIs / SQL functions),&lt;BR /&gt;or by reading _change_type via VERSION AS OF / querying the CDF view. Then perform a MERGE INTO into the local target.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;5) Production hardening &amp;amp; best practices.&lt;/STRONG&gt;&lt;BR /&gt;- Primary key / dedupe logic: ensure stable unique keys (surrogate or natural) so MERGE can be deterministic.&lt;BR /&gt;- Idempotency: design MERGE to be idempotent. Keep last_processed_version and do retries safely. Avoid relying solely on timestamps if clocks differ. Prefer Delta table versions.&lt;BR /&gt;- Small target partitions: co-locate keys to avoid large merges scanning entire table. Use partition pruning where possible. See Delta best practices for speeding MERGE.&lt;BR /&gt;- Optimize &amp;amp; ZORDER after large merges for read performance.&lt;BR /&gt;- Schema evolution: enable controlled MERGE schema evolution or handle with separate migration jobs. Keep producer and consumer schema compatibility stable.&lt;BR /&gt;- Handle deletes: CDF exposes deletes as change type delete. Ensure MERGE includes delete logic or soft-delete flags.&lt;BR /&gt;- Batch size &amp;amp; concurrency: tune how many versions you process per run. Very large change batches should be split. Avoid overlapping runs by using job locks.&lt;BR /&gt;- Security &amp;amp; governance: Use Unity Catalog privileges, audit logs, and grant least privilege on shares. Monitor who the recipient is and what tables are shared.&lt;BR /&gt;- Backfill / cutover: For initial load, copy full snapshot to recipient local table and record the version you started from. Then use CDF from that version+1 onward.&lt;BR /&gt;- Monitoring &amp;amp; alerting: Instrument job success/failure, row counts, lag (version/time), and reconcile counts against provider. Add SLA alerts.&lt;BR /&gt;- Testing: unit tests for merge logic, integration tests with small synthetic data, chaos tests for incomplete writes, and DR tests for table restores (Time Travel).&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;6) Operational checklist before going live&lt;/STRONG&gt;&lt;BR /&gt;- Unity Catalog enabled on both workspaces.&lt;BR /&gt;- Provider share created and recipient granted.&lt;BR /&gt;- CDF enabled on source tables (if using CDF).&lt;BR /&gt;- Last_processed_version/timestamp persisted in a durable table per consumer job.&lt;BR /&gt;- MERGE job implemented, idempotent, and tested (unit + integration).&lt;BR /&gt;- Job scheduling (Databricks Jobs / Airflow / ADF) with retries and locks.&lt;BR /&gt;- Monitoring, alerting, and cost estimates validated.&lt;BR /&gt;- Security review (Unity Catalog privileges, network controls).&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;7) Helpful links (official docs)&lt;/STRONG&gt;&lt;BR /&gt;Set up Delta Sharing (Azure Databricks provider): Microsoft Learn.&lt;BR /&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/delta-sharing/set-up" target="_blank"&gt;https://learn.microsoft.com/en-us/azure/databricks/delta-sharing/set-up&lt;/A&gt;&lt;BR /&gt;Create/manage recipients for Delta Sharing (Unity Catalog): Microsoft Learn.&lt;BR /&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/delta-sharing/create-recipient" target="_blank"&gt;https://learn.microsoft.com/en-us/azure/databricks/delta-sharing/create-recipient&lt;/A&gt;&lt;BR /&gt;Databricks-to-Databricks Delta Sharing overview (how recipient reads shared tables).&lt;BR /&gt;&lt;A href="https://docs.databricks.com/aws/en/delta-sharing/share-data-databricks" target="_blank"&gt;https://docs.databricks.com/aws/en/delta-sharing/share-data-databricks&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Delta Lake Change Data Feed (CDF) documentation.&lt;BR /&gt;&lt;A href="https://docs.databricks.com/aws/en/delta/delta-change-data-feed" target="_blank"&gt;https://docs.databricks.com/aws/en/delta/delta-change-data-feed&lt;/A&gt;&lt;BR /&gt;MERGE INTO (Delta Lake upsert) docs and best practices.&lt;BR /&gt;&lt;A href="https://docs.databricks.com/aws/en/delta/merge?utm_source=chatgpt.com" target="_blank"&gt;https://docs.databricks.com/aws/en/delta/merge?utm_source&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 12 Aug 2025 17:55:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/data-transfer-using-unity-catalog-full-implementation/m-p/128273#M48196</guid>
      <dc:creator>lingareddy_Alva</dc:creator>
      <dc:date>2025-08-12T17:55:11Z</dc:date>
    </item>
    <item>
      <title>Re: Data Transfer using Unity Catalog full implementation</title>
      <link>https://community.databricks.com/t5/data-engineering/data-transfer-using-unity-catalog-full-implementation/m-p/128314#M48206</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/24053"&gt;@lingareddy_Alva&lt;/a&gt;&amp;nbsp;&amp;nbsp; :Thanks a lot . Will this solution also work when&amp;nbsp;&lt;STRONG&gt;Provider &lt;/STRONG&gt;and&amp;nbsp;&lt;STRONG&gt;Recipient &lt;/STRONG&gt;have different accounnt.&lt;/P&gt;&lt;P&gt;Kindly suggest.&lt;/P&gt;</description>
      <pubDate>Wed, 13 Aug 2025 09:29:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/data-transfer-using-unity-catalog-full-implementation/m-p/128314#M48206</guid>
      <dc:creator>Datalight</dc:creator>
      <dc:date>2025-08-13T09:29:35Z</dc:date>
    </item>
    <item>
      <title>Re: Data Transfer using Unity Catalog full implementation</title>
      <link>https://community.databricks.com/t5/data-engineering/data-transfer-using-unity-catalog-full-implementation/m-p/128338#M48212</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/179126"&gt;@Datalight&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Yes, the Unity Catalog + Delta Sharing approach I outlined works even if the provider and recipient are in completely different Azure accounts (or even different clouds),&lt;BR /&gt;as long as a below few conditions are met :&lt;BR /&gt;1. Unity Catalog Enabled in Both Workspaces&lt;BR /&gt;2. Share &amp;amp; Recipient Configuration&lt;BR /&gt;3. Networking &amp;amp; Firewall Rules&lt;BR /&gt;4. Permissions&lt;BR /&gt;5. Data Format &amp;amp; Features&lt;BR /&gt;6. Supported Regions&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 13 Aug 2025 12:49:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/data-transfer-using-unity-catalog-full-implementation/m-p/128338#M48212</guid>
      <dc:creator>lingareddy_Alva</dc:creator>
      <dc:date>2025-08-13T12:49:15Z</dc:date>
    </item>
    <item>
      <title>Re: Data Transfer using Unity Catalog full implementation</title>
      <link>https://community.databricks.com/t5/data-engineering/data-transfer-using-unity-catalog-full-implementation/m-p/128434#M48241</link>
      <description>&lt;P&gt;This works well when set up, If you're securely set up in Azure you will need to grant a privatelink to the underlying storage for their service to read data. For enhanced security I'd recommend your catalog for the other party then be in external storage to segregate them&lt;/P&gt;</description>
      <pubDate>Thu, 14 Aug 2025 09:02:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/data-transfer-using-unity-catalog-full-implementation/m-p/128434#M48241</guid>
      <dc:creator>turagittech</dc:creator>
      <dc:date>2025-08-14T09:02:37Z</dc:date>
    </item>
  </channel>
</rss>

