topic Re: concat_ws() throws AnalysisException when too many columns are supplied in Data Engineering

concat_ws() throws AnalysisException when too many columns are supplied

gzenz — Fri, 11 Mar 2022 11:47:03 GMT

Hi,

i'm using concat_ws in scala to calculate a checksum for the dataframe, i.e.:

df.withColumn("CHECKSUM", sha2(functions.concat_ws("", dataframe.columns.map(col): _*), 512))

I have one example here with just 24 columns that already throws the following exception: org.apache.spark.sql.AnalysisException: cannot resolve 'concat_ws('', <list of the columns)

Any ideas what's happening? I assume the list get's too long (char wise), but I have no idea how to make this work.

Thanks!

Re: concat_ws() throws AnalysisException when too many columns are supplied

Hubert-Dudek — Fri, 11 Mar 2022 12:15:44 GMT

at least one of column names can have some strange char, whitespace or something,
or at least one of column type is not compatible (for example StructType)
you can separate your code to two or more steps. First generate list of columns as some variable, than create column to concatenate than new column with sha that column. It is easier to debug and also more efficient for spark as it use lazy evolution and logical/physical plans and adaptive query execution.