I’m facing an issue while working with federated Redshift tables in Databricks, and I’m hoping someone here can help me out.
I have a source table(material) in Redshift that I’m querying through the Delta lake federation in Databricks. when I run the query on the federated table, I’m seeing incorrect results, where as the same query gives the expected result when it is run against the same table stored as Delta table in catalog and also gives expected result when the same query run on the source table (Material) in Redshift directly via Data grip.
Here’s what I’ve tried so far:
(I) I created a Delta table in Databricks, also named material, by reading data from the federated Redshift table and loaded it into the Databricks catalog.
When I run the same query on the Delta table (material), I get the expected, correct output.
(ii) Additionally, when I run the same query in Redshift directly using Datagrip, I get the expected result as well.
(iii) However, when I run the same query on the federated table, it gives incorrect results.
Has anyone faced a similar issue or does anyone know why this discrepancy is occurring between the federated table and the Delta table?
Would really appreciate any suggestions or insights.
Issue Details:
Expected Behavior:
select * from material where material is not null and ((material_group = 'X0450' and material_description like '%FILT%') OR material_group in ('X0500','X0600')))
The query should not return the material "YYYY" because its product group 'X0400' is not included in the filter conditions (BMATLGRP = 'X0450' or BMATLGRP IN ('X0030', 'X0040', 'X0050')).
This behavior is correctly observed when querying the Delta Lake table in the Databricks catalog and the Redshift table directly.
Observed Behavior:
The federated table query in Databricks incorrectly returns the material "YYYY" with product group 'X0400'
Query:
1. Query on Federated Table (Incorrect Results):
Result:
The query returns the material "YYYY" with product group 'X0400', which is incorrect.
When the same above query run on top of Delta lake table and in Redshift directly the query does not return the material "YYYY", which is correct.
I even checked the execution plan of the query in Databricks the predicate pushdown is not happening, I am not sure does the issue is because of this.
Questions:
1) Has anyone encountered similar issues with federated tables returning inconsistent results compared to source systems?
2) Are there any known limitations or configurations for federated tables that could cause this behavior?