Data Engineering

by cfregly • Contributor

05-26-2015 11:38:48 AM

5358 Views
4 replies
0 kudos

How do I group my dataset by a key or combination of keys without doing any aggregations using RDDs, DataFrames, and SQL?

Data Engineering

5358 Views
4 replies
0 kudos

05-26-2015 11:38:48 AM

View Replies

Latest Reply

GeethGovindSrin
New Contributor II

12-19-2019 2:47:04 AM

0 kudos

@cfregly : For DataFrames, you can use the following code for using groupBy without aggregations.Df.groupBy(Df["column_name"]).agg({})

0 kudos

12-19-2019 2:47:04 AM

3 More Replies

by UmeshKacha • New Contributor II

05-21-2016 1:37:22 PM

12448 Views
3 replies
0 kudos

How to avoid empty/null keys in DataFrame groupby?

Hi I have Spark job which does group by and I cant avoid it because of my use case. I have large dataset around 1 TB which I need to process/update in DataFrame. Now my jobs shuffles huge data and slows things because of shuffling and groupby. One r...

Data Engineering

12448 Views
3 replies
0 kudos

05-21-2016 1:37:22 PM

View Replies

Latest Reply

silvio
Databricks Employee

06-03-2016 9:39:57 AM

0 kudos

Hi Umesh,If you want to completely ignore the null/empty values then you could simply filter before you do the groupBy, but are you wanting to keep those values?If you want to keep the null values and avoid the skew, you could try splitting the DataF...

0 kudos

06-03-2016 9:39:57 AM

2 More Replies

Databricks Community

Forum Posts

How do I group my dataset by a key or combination of keys without doing any aggregations using RDDs, DataFrames, and SQL?

How to avoid empty/null keys in DataFrame groupby?