Databricks Community

Vijaykumarj · ‎01-04-2023

I have files in azure data lake. I am using autoloader to read the incremental files

files don't have primary key to load, In this case i want to use some columns and generate an hashkey and use it as primary key to do changes.

In this case i want to load my initial file with haskkey column should be appended

and for microbatches also hashkey needs to be appended .

but while i am using sh2 to generate hash key getting error

inp file1:

inputpath = 'abfss://***@***.dfs.core.windows.net/test/'

df = spark.readStream.format("cloudFiles").option("cloudFiles.format","csv").option("cloudFiles.schemaEvolutionMode","rescue").option("cloudFIles.schemaLocation", checkpoint_path).load(inputpath)

df.withColumns("Hashkey",sha2(concat_ws(",",df['id'],df['product_Name'],df['Location'],df['offer_code']),256))

getting

AssertionError:

Hubert-Dudek · ‎01-05-2023

Can you copy the whole error?

I bet that it should be withColumn not with withColumns (remove s)

My blog: https://databrickster.medium.com/

Vijaykumarj · ‎01-05-2023

Sorry for delay in response, refer below for error

Jfoxyyc · ‎01-05-2023

Try withColumn. withColumns takes an array of columns and does something with them, like rename using regex. withColumn creates new column named whatever.