cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 
Data Engineering
cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 

Query with distinct sort and alias produces error column not found

Kotofosonline
New Contributor III

Iā€™m trying to use sql query on azure-databricks with distinct sort and aliases

SELECT DISTINCT album.ArtistId AS my_alias 
FROM album ORDER BY album.ArtistId

The problem is that if I add an alias then I can not use not aliased name in the order by clause.

ORDER BY album.ArtistId part produces an error. ORDER BY my_alias works.

If I remove distinct it also works.

Error in SQL statement: AnalysisException: cannot resolve '`album.ArtistId`' given input columns: [my_alias]; line 2 pos 22;
'Sort ['album.ArtistId ASC NULLS FIRST], true
+- Distinct
   +- Project [ArtistId#2506 AS my_alias#2500]
      +- SubqueryAlias spark_catalog.chinook.album
         +- Relation[AlbumId#2504,Title#2505,ArtistId#2506] parquet

Seems like after Project step original column name is lost. That behavior is unexpected for SQL compared to other SQL dialects. And I can not find any documentation about it.

Is there any way to make this query run as is, without modification maybe by changing some execution flags?

2 REPLIES 2

User16756723392
New Contributor III

SELECT album.ArtistId ,DISTINCT album.ArtistId AS my_alias FROM album ORDER BY album.ArtistId

Can you try this

Hi @Kotofos onlineā€‹ā€‹, Share the wisdom! By marking the best answers, you help others in our community find valuable information quickly and efficiently.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.