cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Delta sharing speed

turtleXturtle
New Contributor II

Hi - I am comparing the performance of delta shared tables and the speed is 10X slower than when querying locally.

Scenario:

I am using a 2XS serverless SQL warehouse, and have a table with 15M rows and 10 columns, using the below query:

select date, count(*) as num_rows, sum(spend) as total_spend
from catalog.schema.table
group by date
order by 1

I have an account on AWS us-east-1 and AWS us-west-2 for testing.  I am using an R2 bucket in ENAM for the share.

Test: 

If I run on the normal delta table in account 1, this returns in 1 second.

If I deep clone into an R2 bucket and then query the deep cloned table, that also returns in 1 second.

If I delta share the R2 table to account 2, and then query there, that returns in 10 seconds.

If I create a copy of the shared table in account 2, that returns in 1 second.

Question

Is this speed difference expected? Am I doing something wrong or is best practice to copy delta shared tables to local storage (defeating a big benefit of delta sharing)?

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group