Databricks Community

Harsha777 · ‎09-02-2024

Hi Team,

I have a query with below construct in my project

SELECT count(*) FROM `catalog`.`schema`.`t_table`
WHERE _col_check IN (SELECT DISTINCT _col_check FROM `catalog`.`schema`.`t_check_table`)

Actually, there is no column "_col_check" in the sub-query table "t_check_table". (but present in main table "t_table"
The expectation is that the query will fail but interestingly the query is giving the total count of the main table "t_table".
When the sub-query is executed individually, it fails with column not found error.

If I use a column which is not present in the main table then the query fails.

I assume it is using the values of the main table when the column is present in the main table and raising error when the column is not present in both the tables.

I am not sure if I am over-looking anything here or is it an issue, could someone comment of any such observations?

filipniziol · ‎09-02-2024

Hi @Harsha777 ,

What occurs is called column shadowing.
What happens is that the column names in main query and sub query are identica and the databricks engine after not finding it in sub query searches in the main query.

The simplest way to avoid the issue is to add to the column the table alias like below:

SELECT count(*) FROM `catalog`.`schema`.`t_table` AS main
WHERE main._col_check IN (
    SELECT DISTINCT sub._col_check FROM `catalog`.`schema`.`t_check_table` AS sub
)

View solution in original post

filipniziol · ‎09-02-2024

Hi @Harsha777 ,

What occurs is called column shadowing.
What happens is that the column names in main query and sub query are identica and the databricks engine after not finding it in sub query searches in the main query.

The simplest way to avoid the issue is to add to the column the table alias like below:

SELECT count(*) FROM `catalog`.`schema`.`t_table` AS main
WHERE main._col_check IN (
    SELECT DISTINCT sub._col_check FROM `catalog`.`schema`.`t_check_table` AS sub
)