I’m trying to use sql query on azure-databricks with distinct sort and aliases
SELECT DISTINCT album.ArtistId AS my_alias
FROM album ORDER BY album.ArtistId
The problem is that if I add an alias then I can not use not aliased name in the order by clause.
ORDER BY album.ArtistId part produces an error. ORDER BY my_alias works.
If I remove distinct it also works.
Error in SQL statement: AnalysisException: cannot resolve '`album.ArtistId`' given input columns: [my_alias]; line 2 pos 22;
'Sort ['album.ArtistId ASC NULLS FIRST], true
+- Distinct
+- Project [ArtistId#2506 AS my_alias#2500]
+- SubqueryAlias spark_catalog.chinook.album
+- Relation[AlbumId#2504,Title#2505,ArtistId#2506] parquet
Seems like after Project step original column name is lost. That behavior is unexpected for SQL compared to other SQL dialects. And I can not find any documentation about it.
Is there any way to make this query run as is, without modification maybe by changing some execution flags?