cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Sub-Query behavior in sql statements

Harsha777
New Contributor III

Hi Team,

I have a query with below construct in my project

SELECT count(*) FROM `catalog`.`schema`.`t_table`
WHERE _col_check IN (SELECT DISTINCT _col_check FROM `catalog`.`schema`.`t_check_table`)

Actually, there is no column "_col_check" in the sub-query table "t_check_table". (but present in main table "t_table"
The expectation is that the query will fail but interestingly the query is giving the total count of the main table "t_table".
When the sub-query is executed individually, it fails with column not found error.

If I use a column which is not present in the main table then the query fails.

I assume it is using the values of the main table when the column is present in the main table and raising error when the column is not present in both the tables.

I am not sure if I am over-looking anything here or is it an issue, could someone comment of any such observations?

1 ACCEPTED SOLUTION

Accepted Solutions

filipniziol
New Contributor III

Hi @Harsha777 ,

What occurs is called column shadowing.
What happens is that the column names in main query and sub query  are identica and the databricks engine after not finding it in sub query searches in the main query.

The simplest way to avoid the issue is to add to the column the table alias like below:

SELECT count(*) FROM `catalog`.`schema`.`t_table` AS main
WHERE main._col_check IN (
    SELECT DISTINCT sub._col_check FROM `catalog`.`schema`.`t_check_table` AS sub
)

 

View solution in original post

3 REPLIES 3

filipniziol
New Contributor III

Hi @Harsha777 ,

What occurs is called column shadowing.
What happens is that the column names in main query and sub query  are identica and the databricks engine after not finding it in sub query searches in the main query.

The simplest way to avoid the issue is to add to the column the table alias like below:

SELECT count(*) FROM `catalog`.`schema`.`t_table` AS main
WHERE main._col_check IN (
    SELECT DISTINCT sub._col_check FROM `catalog`.`schema`.`t_check_table` AS sub
)

 

thanks @filipniziol for sharing the information, that helps!!

Just curious to know if it is the databricks sql behavior or in general sql behavior with all databases?

Hi @Harsha777 ,
It's general, not related to databricks only.

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group