Generate sh2 hashkey while loading files to delta table
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-04-2023 09:23 PM
I have files in azure data lake. I am using autoloader to read the incremental files
files don't have primary key to load, In this case i want to use some columns and generate an hashkey and use it as primary key to do changes.
In this case i want to load my initial file with haskkey column should be appended
and for microbatches also hashkey needs to be appended .
but while i am using sh2 to generate hash key getting error
inp file1:
inputpath = 'abfss://***@***.dfs.core.windows.net/test/'
df = spark.readStream.format("cloudFiles").option("cloudFiles.format","csv").option("cloudFiles.schemaEvolutionMode","rescue").option("cloudFIles.schemaLocation", checkpoint_path).load(inputpath)
df.withColumns("Hashkey",sha2(concat_ws(",",df['id'],df['product_Name'],df['Location'],df['offer_code']),256))
getting
AssertionError:
- Labels:
-
Delta table
-
Sf Username
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-05-2023 02:15 AM
Can you copy the whole error?
I bet that it should be withColumn not with withColumns (remove s)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-05-2023 06:15 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-05-2023 10:04 PM
Try withColumn. withColumns takes an array of columns and does something with them, like rename using regex. withColumn creates new column named whatever.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-05-2023 10:43 AM
Hi , Could you please provide the error code?