- 3649 Views
- 4 replies
- 0 kudos
Latest Reply
@cfregly​ : For DataFrames, you can use the following code for using groupBy without aggregations.Df.groupBy(Df["column_name"]).agg({})
- 0 kudos
@cfregly​ : For DataFrames, you can use the following code for using groupBy without aggregations.Df.groupBy(Df["column_name"]).agg({})
Hi I have Spark job which does group by and I cant avoid it because of my use case. I have large dataset around 1 TB which I need to process/update in DataFrame. Now my jobs shuffles huge data and slows things because of shuffling and groupby. One r...
Hi Umesh,If you want to completely ignore the null/empty values then you could simply filter before you do the groupBy, but are you wanting to keep those values?If you want to keep the null values and avoid the skew, you could try splitting the DataF...