Query with distinct sort and alias produces error column not found
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-08-2021 04:41 AM
I’m trying to use sql query on azure-databricks with distinct sort and aliases
SELECT DISTINCT album.ArtistId AS my_alias
FROM album ORDER BY album.ArtistId
The problem is that if I add an alias then I can not use not aliased name in the order by clause.
ORDER BY album.ArtistId part produces an error. ORDER BY my_alias works.
If I remove distinct it also works.
Error in SQL statement: AnalysisException: cannot resolve '`album.ArtistId`' given input columns: [my_alias]; line 2 pos 22;
'Sort ['album.ArtistId ASC NULLS FIRST], true
+- Distinct
+- Project [ArtistId#2506 AS my_alias#2500]
+- SubqueryAlias spark_catalog.chinook.album
+- Relation[AlbumId#2504,Title#2505,ArtistId#2506] parquet
Seems like after Project step original column name is lost. That behavior is unexpected for SQL compared to other SQL dialects. And I can not find any documentation about it.
Is there any way to make this query run as is, without modification maybe by changing some execution flags?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-14-2023 12:22 AM
SELECT album.ArtistId ,DISTINCT album.ArtistId AS my_alias FROM album ORDER BY album.ArtistId
Can you try this

