Hi Umesh,
If you want to completely ignore the null/empty values then you could simply filter before you do the groupBy, but are you wanting to keep those values?
If you want to keep the null values and avoid the skew, you could try splitting the DataFrame. See if you think this would meet your needs:
val noNulls = sourceFrame
.filter(!isnull($"colE"))
.groupBy($"colB", $"colC", $"colD", $"colE")
.agg(sum($"colA"))
val onlyNulls = sourceFrame
.filter(isnull($"colE"))
.groupBy($"colB", $"colC", $"colD")
.agg(sum($"colA"))
You can also use the null value replacement in DataFrameNaFunctions.
Thanks,
Silvio