<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Delta sharing speed in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/delta-sharing-speed/m-p/82569#M36686</link>
    <description>&lt;P&gt;Hi - I am comparing the performance of delta shared tables and the speed is 10X slower than when querying locally.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Scenario:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;I am using a 2XS serverless SQL warehouse, and have a table with 15M rows and 10 columns, using the below query:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;select date, count(*) as num_rows, sum(spend) as total_spend
from catalog.schema.table
group by date
order by 1&lt;/LI-CODE&gt;&lt;P&gt;I have an account on AWS us-east-1 and AWS us-west-2 for testing.&amp;nbsp; I am using an R2 bucket in ENAM for the share.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Test:&lt;/STRONG&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If I run on the normal delta table in account 1, this returns in 1 second.&lt;/P&gt;&lt;P&gt;If I deep clone into an R2 bucket and then query the deep cloned table, that also returns in 1 second.&lt;/P&gt;&lt;P&gt;If I delta share the R2 table to account 2, and then query there, that returns in &lt;STRONG&gt;10 seconds&lt;/STRONG&gt;.&lt;/P&gt;&lt;P&gt;If I create a copy of the shared table in account 2, that returns in 1 second.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Question&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Is this speed difference expected? Am I doing something wrong or is best practice to copy delta shared tables to local storage (defeating a big benefit of delta sharing)?&lt;/P&gt;</description>
    <pubDate>Fri, 09 Aug 2024 14:23:02 GMT</pubDate>
    <dc:creator>turtleXturtle</dc:creator>
    <dc:date>2024-08-09T14:23:02Z</dc:date>
    <item>
      <title>Delta sharing speed</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-sharing-speed/m-p/82569#M36686</link>
      <description>&lt;P&gt;Hi - I am comparing the performance of delta shared tables and the speed is 10X slower than when querying locally.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Scenario:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;I am using a 2XS serverless SQL warehouse, and have a table with 15M rows and 10 columns, using the below query:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;select date, count(*) as num_rows, sum(spend) as total_spend
from catalog.schema.table
group by date
order by 1&lt;/LI-CODE&gt;&lt;P&gt;I have an account on AWS us-east-1 and AWS us-west-2 for testing.&amp;nbsp; I am using an R2 bucket in ENAM for the share.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Test:&lt;/STRONG&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If I run on the normal delta table in account 1, this returns in 1 second.&lt;/P&gt;&lt;P&gt;If I deep clone into an R2 bucket and then query the deep cloned table, that also returns in 1 second.&lt;/P&gt;&lt;P&gt;If I delta share the R2 table to account 2, and then query there, that returns in &lt;STRONG&gt;10 seconds&lt;/STRONG&gt;.&lt;/P&gt;&lt;P&gt;If I create a copy of the shared table in account 2, that returns in 1 second.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Question&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Is this speed difference expected? Am I doing something wrong or is best practice to copy delta shared tables to local storage (defeating a big benefit of delta sharing)?&lt;/P&gt;</description>
      <pubDate>Fri, 09 Aug 2024 14:23:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-sharing-speed/m-p/82569#M36686</guid>
      <dc:creator>turtleXturtle</dc:creator>
      <dc:date>2024-08-09T14:23:02Z</dc:date>
    </item>
    <item>
      <title>Re: Delta sharing speed</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-sharing-speed/m-p/139316#M51152</link>
      <description>&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Yes, the speed difference you are seeing when querying Delta Shared tables versus local Delta tables is expected due to the architectural nature of Delta Sharing and network constraints.&lt;/P&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Why Delta Sharing Is Slower&lt;/H2&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;When you query a standard Delta table locally, your compute cluster accesses the underlying data in the same storage environment, benefiting from data skipping, caching, and low-latency access patterns. However, with Delta Sharing, queries run on&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;EM&gt;foreign&lt;/EM&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;storage, often across account or even region boundaries. This means:​&lt;/P&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Data must be read over the network with each query, introducing additional latency and lower bandwidth compared to local disk or in-region storage.​&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Delta Sharing does not use local caching in the compute warehouse for the shared data, so every access involves "cold" reads from the source compared to a cloned or copied Delta table that can use native data caching and optimized layout.​&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;For large datasets, network overheads and lack of partition/file caching become even more significant bottlenecks.​&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Is This Best Practice?&lt;/H2&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;No, for highly interactive or latency-sensitive workloads, this overhead is a known tradeoff. The main benefits of Delta Sharing are up-to-date data access and not needing to copy large datasets often. However, for performance-critical cases, Databricks and broader community recommendations are:​&lt;/P&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;If you must run many production queries or require low-latency response times, make a local copy (materialized table or deep clone) in the consuming account. This defeats some of the "no-copy" appeal, but gives you local data skipping, caching, and optimized performance.​&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;For exploratory analytics or ad hoc reporting where slightly higher latencies are acceptable, querying the shared Delta table directly is reasonable.​&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;The most common best practice: create materialized views or periodically copy data for frequent, performance-sensitive workloads; reserve live Delta Sharing for less frequent, up-to-date, or cross-org scenarios.​&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Additional Notes&lt;/H2&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Cross-region Delta Sharing is even slower due to inter-region bandwidth limits and should be avoided for production workloads requiring fast queries.​&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;The size and frequency of your queries matters: smaller, more selective queries (with filters/partitions) perform much better than full-table scans through Delta Sharing.​&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Deep cloned or locally copied versions of the table will always have lower latency and support advanced optimizations like data skipping, delta cache, and parallel access.​&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;In summary, your results are typical and indicate the expected tradeoff between convenience of sharing and the underlying performance characteristics. For best performance, copy or deep clone shared tables to local storage for repeated or critical queries.​&lt;/P&gt;</description>
      <pubDate>Mon, 17 Nov 2025 11:49:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-sharing-speed/m-p/139316#M51152</guid>
      <dc:creator>mark_ott</dc:creator>
      <dc:date>2025-11-17T11:49:59Z</dc:date>
    </item>
  </channel>
</rss>

