Databricks Community

Kotofosonline · ‎09-08-2021

I’m trying to use sql query on azure-databricks with distinct sort and aliases

SELECT DISTINCT album.ArtistId AS my_alias 
FROM album ORDER BY album.ArtistId

The problem is that if I add an alias then I can not use not aliased name in the order by clause.

ORDER BY album.ArtistId part produces an error. ORDER BY my_alias works.

If I remove distinct it also works.

Error in SQL statement: AnalysisException: cannot resolve '`album.ArtistId`' given input columns: [my_alias]; line 2 pos 22;
'Sort ['album.ArtistId ASC NULLS FIRST], true
+- Distinct
   +- Project [ArtistId#2506 AS my_alias#2500]
      +- SubqueryAlias spark_catalog.chinook.album
         +- Relation[AlbumId#2504,Title#2505,ArtistId#2506] parquet

Seems like after Project step original column name is lost. That behavior is unexpected for SQL compared to other SQL dialects. And I can not find any documentation about it.

Is there any way to make this query run as is, without modification maybe by changing some execution flags?