cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Query with distinct sort and alias produces error column not found

Kotofosonline
New Contributor III

I’m trying to use sql query on azure-databricks with distinct sort and aliases

SELECT DISTINCT album.ArtistId AS my_alias 
FROM album ORDER BY album.ArtistId

The problem is that if I add an alias then I can not use not aliased name in the order by clause.

ORDER BY album.ArtistId part produces an error. ORDER BY my_alias works.

If I remove distinct it also works.

Error in SQL statement: AnalysisException: cannot resolve '`album.ArtistId`' given input columns: [my_alias]; line 2 pos 22;
'Sort ['album.ArtistId ASC NULLS FIRST], true
+- Distinct
   +- Project [ArtistId#2506 AS my_alias#2500]
      +- SubqueryAlias spark_catalog.chinook.album
         +- Relation[AlbumId#2504,Title#2505,ArtistId#2506] parquet

Seems like after Project step original column name is lost. That behavior is unexpected for SQL compared to other SQL dialects. And I can not find any documentation about it.

Is there any way to make this query run as is, without modification maybe by changing some execution flags?

1 REPLY 1

User16756723392
Databricks Employee
Databricks Employee

SELECT album.ArtistId ,DISTINCT album.ArtistId AS my_alias FROM album ORDER BY album.ArtistId

Can you try this

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group