I have files in azure data lake. I am using autoloader to read the incremental files
files don't have primary key to load, In this case i want to use some columns and generate an hashkey and use it as primary key to do changes.
In this case i want to load my initial file with haskkey column should be appended
and for microbatches also hashkey needs to be appended .
but while i am using sh2 to generate hash key getting error
inp file1:
inputpath = 'abfss://***@***.dfs.core.windows.net/test/'
df = spark.readStream.format("cloudFiles").option("cloudFiles.format","csv").option("cloudFiles.schemaEvolutionMode","rescue").option("cloudFIles.schemaLocation", checkpoint_path).load(inputpath)
df.withColumns("Hashkey",sha2(concat_ws(",",df['id'],df['product_Name'],df['Location'],df['offer_code']),256))
getting
AssertionError: