Iโm trying to use sql query on azure-databricks with distinct sort and aliases
SELECT DISTINCT album.ArtistId AS my_alias
FROM album ORDER BY album.ArtistId
The problem is that if I add an alias then I can not use not aliased name in the order by clause.
ORDER BY album.ArtistId part produces an error. ORDER BY my_alias works.
If I remove distinct it also works.
Seems like after Project step original column name is lost. That behavior is unexpected for SQL compared to other SQL dialects. And I can not find any documentation about it.
Error in SQL statement: AnalysisException: cannot resolve '`album.ArtistId`' given input columns: [my_alias]; line 2 pos 22;
'Sort ['album.ArtistId ASC NULLS FIRST], true
+- Distinct
+- Project [ArtistId#2506 AS my_alias#2500]
+- SubqueryAlias spark_catalog.chinook.album
+- Relation[AlbumId#2504,Title#2505,ArtistId#2506] parquet
Is there any way to make this query run as is, without modification maybe by changing some execution flags?
The query is generated by sqlachemy and works on other databases fine, I prefer to not edit it for databricks.
The schema used for testing is Chinook database.
The where clause works fine
SELECT DISTINCT album.ArtistId AS my_alias
FROM album WHERE album.ArtistId = 1