cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Delta sharing speed

turtleXturtle
New Contributor II

Hi - I am comparing the performance of delta shared tables and the speed is 10X slower than when querying locally.

Scenario:

I am using a 2XS serverless SQL warehouse, and have a table with 15M rows and 10 columns, using the below query:

select date, count(*) as num_rows, sum(spend) as total_spend
from catalog.schema.table
group by date
order by 1

I have an account on AWS us-east-1 and AWS us-west-2 for testing.  I am using an R2 bucket in ENAM for the share.

Test: 

If I run on the normal delta table in account 1, this returns in 1 second.

If I deep clone into an R2 bucket and then query the deep cloned table, that also returns in 1 second.

If I delta share the R2 table to account 2, and then query there, that returns in 10 seconds.

If I create a copy of the shared table in account 2, that returns in 1 second.

Question

Is this speed difference expected? Am I doing something wrong or is best practice to copy delta shared tables to local storage (defeating a big benefit of delta sharing)?

1 REPLY 1

mark_ott
Databricks Employee
Databricks Employee

Yes, the speed difference you are seeing when querying Delta Shared tables versus local Delta tables is expected due to the architectural nature of Delta Sharing and network constraints.

Why Delta Sharing Is Slower

When you query a standard Delta table locally, your compute cluster accesses the underlying data in the same storage environment, benefiting from data skipping, caching, and low-latency access patterns. However, with Delta Sharing, queries run on foreign storage, often across account or even region boundaries. This means:​

  • Data must be read over the network with each query, introducing additional latency and lower bandwidth compared to local disk or in-region storage.​

  • Delta Sharing does not use local caching in the compute warehouse for the shared data, so every access involves "cold" reads from the source compared to a cloned or copied Delta table that can use native data caching and optimized layout.​

  • For large datasets, network overheads and lack of partition/file caching become even more significant bottlenecks.​

Is This Best Practice?

No, for highly interactive or latency-sensitive workloads, this overhead is a known tradeoff. The main benefits of Delta Sharing are up-to-date data access and not needing to copy large datasets often. However, for performance-critical cases, Databricks and broader community recommendations are:​

  • If you must run many production queries or require low-latency response times, make a local copy (materialized table or deep clone) in the consuming account. This defeats some of the "no-copy" appeal, but gives you local data skipping, caching, and optimized performance.​

  • For exploratory analytics or ad hoc reporting where slightly higher latencies are acceptable, querying the shared Delta table directly is reasonable.​

  • The most common best practice: create materialized views or periodically copy data for frequent, performance-sensitive workloads; reserve live Delta Sharing for less frequent, up-to-date, or cross-org scenarios.​

Additional Notes

  • Cross-region Delta Sharing is even slower due to inter-region bandwidth limits and should be avoided for production workloads requiring fast queries.​

  • The size and frequency of your queries matters: smaller, more selective queries (with filters/partitions) perform much better than full-table scans through Delta Sharing.​

  • Deep cloned or locally copied versions of the table will always have lower latency and support advanced optimizations like data skipping, delta cache, and parallel access.​

In summary, your results are typical and indicate the expected tradeoff between convenience of sharing and the underlying performance characteristics. For best performance, copy or deep clone shared tables to local storage for repeated or critical queries.​

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now