Data Engineering

Forum Posts

Sorted by:

by Anonymous • Not applicable

02-08-2023 3:45:09 AM

2421 Views
4 replies
0 kudos

Objective is to make table unique at ID using group by , concat_ws and collect_list ,combining distinct values in one row.

Objective is to make table unique at ID. Table structure is as in attached image.Query used is : selectID,concat_ws(' & ' , collect_list(Distinct Gender)) as Genderfrom tablegroup by IDIt can be possible if we can order values within collect_list and...

Data Engineering

2421 Views
4 replies
0 kudos

02-08-2023 3:45:09 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 1:37:37 AM

0 kudos

Hi @Rishabh Shanker Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answe...

0 kudos

04-10-2023 1:37:37 AM

3 More Replies

by Harun • Honored Contributor

03-22-2023 7:09:11 AM

9883 Views
2 replies
0 kudos

Issue with Pyspark GroupBy GroupedData

Hi Guys,I am working on streaming data movement from bronze to silver. My bronze table is having a entity_name column, based on the entity_name column i need to create multiple silver tables.I tried the below approach, But it is failing with error "'...

Data Engineering

9883 Views
2 replies
0 kudos

03-22-2023 7:09:11 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-26-2023 10:23:37 PM

0 kudos

Hi @Harun Raseed Basheer Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best ...

0 kudos

03-26-2023 10:23:37 PM

1 More Replies

by tusworten • New Contributor II

02-01-2022 8:39:13 AM

6711 Views
3 replies
3 kudos

Spark SQL Group by duplicates, collect_list in array of structs and evaluate rows in each group.

I'm begginner working with Spark SQL in Java API. I have a dataset with duplicate clients grouped by ENTITY and DOCUMENT_ID like this:.withColumn( "ROWNUMBER", row_number().over(Window.partitionBy("ENTITY", "ENTITY_DOC").orderBy("ID")))I added a ROWN...

Data Engineering

6711 Views
3 replies
3 kudos

02-01-2022 8:39:13 AM

View Replies

Latest Reply

tusworten
New Contributor II

02-07-2022 7:31:33 AM

3 kudos

Hi @Kaniz Fatma Her answer didn't solve my problem but it was useful to learn more about UDFS, which I did not know.

3 kudos

02-07-2022 7:31:33 AM

2 More Replies

by SindhuG • New Contributor

07-29-2021 11:55:03 AM

1130 Views
0 replies
0 kudos

Hi All, I need to extract rows of dates from a dataframe based on list of values(e.g. dates) located in a CSV file. Can anyone please help me? I have tried groupby function but am not able to get the expected result. Thanks in advance.

my dataframe looks like this.df = Datecolumn2column3Machine1-jan-2020A2-jan-2020--- A 18-jan-2020 A 11-jan-2020 B 12-jan-2020 B 6-feb-2020C7-feb-2020---C14-feb-2020C Date details csv file looks like this D = MachineSelected DateA15-jan-2020C12-f...

Data Engineering

1130 Views
0 replies
0 kudos

07-29-2021 11:55:03 AM

by LaurentThiebaud • New Contributor

10-07-2019 12:01:20 AM

6144 Views
1 replies
0 kudos

Sort within a groupBy with dataframe

Using Spark DataFrame, eg. myDf .filter(col("timestamp").gt(15000)) .groupBy("groupingKey") .agg(collect_list("aDoubleValue")) I want the collect_list to return the result, but ordered according to "timestamp". i.a. I want the GroupBy results...

Data Engineering

6144 Views
1 replies
0 kudos

10-07-2019 12:01:20 AM

View Replies

Latest Reply

shyam_9
Databricks Employee

10-07-2019 1:43:59 AM

0 kudos

Hi @Laurent Thiebaud,Please use the below format to sort within a groupby, import org.apache.spark.sql.functions._ df.groupBy("columnA").agg(sort_array(collect_list("columnB")))

0 kudos

10-07-2019 1:43:59 AM

Databricks Community

Objective is to make table unique at ID using group by , concat_ws and collect_list ,combining distinct values in one row.

Issue with Pyspark GroupBy GroupedData

Spark SQL Group by duplicates, collect_list in array of structs and evaluate rows in each group.

Hi All, I need to extract rows of dates from a dataframe based on list of values(e.g. dates) located in a CSV file. Can anyone please help me? I have tried groupby function but am not able to get the expected result. Thanks in advance.

Sort within a groupBy with dataframe