cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Lakehouse Federation Join Pushdown

Schaubi
New Contributor II

Hi,

I experimented a little bit with lakehouse federation. I created a connection and foreign catalog that references a SQL Server and activated the public preview feature for Join Pushdowns. After finishing my experiments, it seems to me that the feature has the limitation that cross-schema joins are not pushed down to SQL Server. I could not find any documentation about this limitation (https://learn.microsoft.com/en-us/azure/databricks/query-federation/performance-recommendations#join...), so I wanted to ask if someone knows, if this is a known limitation or if my assumption might be false and the problem lies elsewhere.

Example where the Join is pushed down:

select *
from catalog_name.schema1.table_x tx
inner join catalog_name.schema1.table_y ty
    on tx.join_key = ty.join_key

Example where the join is NOT pushed down:

select *
from catalog_name.schema1.table_x tx
inner join catalog_name.schema2.table_y ty
    on tx.join_key = ty.join_key
I'm using Databricks Runtime 17.3 for my experiments.
 
Thanks,
Stefan
 
 
 
1 ACCEPTED SOLUTION

Accepted Solutions

MoJaMa
Databricks Employee
Databricks Employee

I checked with Engineering. Unfortunately this is not supported and they have an action item to update the docs to clarify it for others.

View solution in original post

2 REPLIES 2

MoJaMa
Databricks Employee
Databricks Employee

I checked with Engineering. Unfortunately this is not supported and they have an action item to update the docs to clarify it for others.

StefanSch
New Contributor II

Hi,
I have experimented a bit with join pushdowns and experienced that intra-schema-Joins are not pushed down if there is a table joined between that is part of another schema.

Example:
In the following example the Join between x1 and x2 is pushed down (when the preview feature is activated):

select * 
from catalog_name.schema_x.table_1 x1 
inner join catalog_name.schema_x.table_2 x2 
    on x1.join_key = x2.join_key 
inner join catalog_name.schema_y.table_3 y3 
    on x1.join_key = y3.join_key

Whereas in the following example this join (between x1 and x2) is not pushed down:

select * 
from catalog_name.schema_x.table_1 x1 
inner join catalog_name.schema_y.table_3 y3 
    on x1.join_key = y3.join_key 
inner join catalog_name.schema_x.table_2 x2 
    on x1.join_key = x2.join_key

To me it's not understandable, why the join order matters here. Is this behaviour intended or might this be a bug in the preview feature?

Thanks,
Stefan