Objective is to make table unique at ID. Table structure is as in attached image.Query used is : selectID,concat_ws(' & ' , collect_list(Distinct Gender)) as Genderfrom tablegroup by IDIt can be possible if we can order values within collect_list and...
Hi @Rishabh Shanker​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answe...
Hi Guys,I am working on streaming data movement from bronze to silver. My bronze table is having a entity_name column, based on the entity_name column i need to create multiple silver tables.I tried the below approach, But it is failing with error "'...
Hi @Harun Raseed Basheer​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best ...
I'm begginner working with Spark SQL in Java API. I have a dataset with duplicate clients grouped by ENTITY and DOCUMENT_ID like this:.withColumn( "ROWNUMBER", row_number().over(Window.partitionBy("ENTITY", "ENTITY_DOC").orderBy("ID")))I added a ROWN...
my dataframe looks like this.df =
Datecolumn2column3Machine1-jan-2020A2-jan-2020---
A
18-jan-2020
A
11-jan-2020
B
12-jan-2020
B
6-feb-2020C7-feb-2020---C14-feb-2020C
Date details csv file looks like this
D =
MachineSelected DateA15-jan-2020C12-f...
Hi @ SindhuG! My name is Kaniz, and I'm a technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.
Using Spark DataFrame, eg.
myDf
.filter(col("timestamp").gt(15000))
.groupBy("groupingKey")
.agg(collect_list("aDoubleValue"))
I want the collect_list to return the result, but ordered according to "timestamp". i.a. I want the GroupBy results...
Hi @Laurent Thiebaud,Please use the below format to sort within a groupby, import org.apache.spark.sql.functions._
df.groupBy("columnA").agg(sort_array(collect_list("columnB")))