cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

concat_ws() throws AnalysisException when too many columns are supplied

gzenz
New Contributor II

Hi,

i'm using concat_ws in scala to calculate a checksum for the dataframe, i.e.:

df.withColumn("CHECKSUM", sha2(functions.concat_ws("", dataframe.columns.map(col): _*), 512))

I have one example here with just 24 columns that already throws the following exception: org.apache.spark.sql.AnalysisException: cannot resolve 'concat_ws('', <list of the columns)

Any ideas what's happening? I assume the list get's too long (char wise), but I have no idea how to make this work.

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

Hubert-Dudek
Esteemed Contributor III
  • at least one of column names can have some strange char, whitespace or something,
  • or at least one of column type is not compatible (for example StructType)
  • you can separate your code to two or more steps. First generate list of columns as some variable, than create column to concatenate than new column with sha that column. It is easier to debug and also more efficient for spark as it use lazy evolution and logical/physical plans and adaptive query execution.

View solution in original post

1 REPLY 1

Hubert-Dudek
Esteemed Contributor III
  • at least one of column names can have some strange char, whitespace or something,
  • or at least one of column type is not compatible (for example StructType)
  • you can separate your code to two or more steps. First generate list of columns as some variable, than create column to concatenate than new column with sha that column. It is easier to debug and also more efficient for spark as it use lazy evolution and logical/physical plans and adaptive query execution.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group