Spark SQL Group by duplicates, collect_list in array of structs and evaluate rows in each group.
I'm begginner working with Spark SQL in Java API. I have a dataset with duplicate clients grouped by ENTITY and DOCUMENT_ID like this:.withColumn( "ROWNUMBER", row_number().over(Window.partitionBy("ENTITY", "ENTITY_DOC").orderBy("ID")))I added a ROWN...
- 5669 Views
- 3 replies
- 3 kudos
Latest Reply
Hi @Kaniz Fatma​ Her answer didn't solve my problem but it was useful to learn more about UDFS, which I did not know.
- 3 kudos