- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-02-2024 07:38 AM
Hi Team,
I have a query with below construct in my project
SELECT count(*) FROM `catalog`.`schema`.`t_table`
WHERE _col_check IN (SELECT DISTINCT _col_check FROM `catalog`.`schema`.`t_check_table`)
Actually, there is no column "_col_check" in the sub-query table "t_check_table". (but present in main table "t_table"
The expectation is that the query will fail but interestingly the query is giving the total count of the main table "t_table".
When the sub-query is executed individually, it fails with column not found error.
If I use a column which is not present in the main table then the query fails.
I assume it is using the values of the main table when the column is present in the main table and raising error when the column is not present in both the tables.
I am not sure if I am over-looking anything here or is it an issue, could someone comment of any such observations?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-02-2024 12:02 PM
Hi @Harsha777 ,
What occurs is called column shadowing.
What happens is that the column names in main query and sub query are identica and the databricks engine after not finding it in sub query searches in the main query.
The simplest way to avoid the issue is to add to the column the table alias like below:
SELECT count(*) FROM `catalog`.`schema`.`t_table` AS main
WHERE main._col_check IN (
SELECT DISTINCT sub._col_check FROM `catalog`.`schema`.`t_check_table` AS sub
)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-02-2024 12:02 PM
Hi @Harsha777 ,
What occurs is called column shadowing.
What happens is that the column names in main query and sub query are identica and the databricks engine after not finding it in sub query searches in the main query.
The simplest way to avoid the issue is to add to the column the table alias like below:
SELECT count(*) FROM `catalog`.`schema`.`t_table` AS main
WHERE main._col_check IN (
SELECT DISTINCT sub._col_check FROM `catalog`.`schema`.`t_check_table` AS sub
)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-03-2024 01:56 AM
thanks @filipniziol for sharing the information, that helps!!
Just curious to know if it is the databricks sql behavior or in general sql behavior with all databases?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-03-2024 02:05 AM
Hi @Harsha777 ,
It's general, not related to databricks only.

