topic Re: Sub-Query behavior in sql statements in Data Engineering

Sub-Query behavior in sql statements

Harsha777 — Mon, 02 Sep 2024 14:38:30 GMT

Hi Team,

I have a query with below construct in my project

SELECT count(*) FROM `catalog`.`schema`.`t_table`
WHERE _col_check IN (SELECT DISTINCT _col_check FROM `catalog`.`schema`.`t_check_table`)

Actually, there is no column "_col_check" in the sub-query table "t_check_table". (but present in main table "t_table"
The expectation is that the query will fail but interestingly the query is giving the total count of the main table "t_table".
When the sub-query is executed individually, it fails with column not found error.

If I use a column which is not present in the main table then the query fails.

I assume it is using the values of the main table when the column is present in the main table and raising error when the column is not present in both the tables.

I am not sure if I am over-looking anything here or is it an issue, could someone comment of any such observations?

Re: Sub-Query behavior in sql statements

filipniziol — Mon, 02 Sep 2024 19:02:06 GMT

Hi @Harsha777 ,

What occurs is called column shadowing.
What happens is that the column names in main query and sub query are identica and the databricks engine after not finding it in sub query searches in the main query.

The simplest way to avoid the issue is to add to the column the table alias like below:

SELECT count(*) FROM `catalog`.`schema`.`t_table` AS main WHERE main._col_check IN ( SELECT DISTINCT sub._col_check FROM `catalog`.`schema`.`t_check_table` AS sub )

Re: Sub-Query behavior in sql statements

Harsha777 — Tue, 03 Sep 2024 08:56:46 GMT

thanks @filipniziol for sharing the information, that helps!!

Just curious to know if it is the databricks sql behavior or in general sql behavior with all databases?

Re: Sub-Query behavior in sql statements

filipniziol — Tue, 03 Sep 2024 09:05:19 GMT

Hi @Harsha777 ,
It's general, not related to databricks only.