cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

agg function not working for multiple aggregations

FrancisLau
New Contributor

Data has 2 columns:

|requestDate|requestDuration|

| 2015-06-17| 104|

Here is the code:

avgSaveTimesByDate = gridSaves.groupBy(gridSaves.requestDate).agg({"requestDuration": "min", "requestDuration": "max","requestDuration": "avg"})

avgSaveTimesByDate.show(100)

Summary of Issue

I expect 4 columns of data: date, min, max and average but only the date and average shows. The first 2 aggs do not show up. If I move max to the last position, only date and max shows up. Very weird.

+-----------+--------------------+ |requestDate|AVG(requestDuration)| +-----------+--------------------+

| 2015-06-10| 750.8886326991035|

Am I doing this incorrectly? I am trying to get a dataframe for a box plot.

1 ACCEPTED SOLUTION

Accepted Solutions

ReKa
New Contributor III

My guess is that the reason this may not work is the fact that the dictionary input does not have unique keys. With this syntax, column-names are keys and if you have two or more aggregation for the same column, some internal loops may forget the non-uniqueness of the keys.

View solution in original post

2 REPLIES 2

rlgarris
Databricks Employee
Databricks Employee

Hi Francis,

Thanks for reaching out.

I just tried this in version 2.0 of Databricks and it appeared to work as expected.

Are you using version 2.0 and Spark 1.4?

If so I would suggest using this alternate syntax:

from pyspark.sql import functions as F

aggs = df.groupBy("cut").agg(df.cut, F.min("carat"), F.max("carat"), F.avg("carat"))

Let me know if that works for you.

ReKa
New Contributor III

My guess is that the reason this may not work is the fact that the dictionary input does not have unique keys. With this syntax, column-names are keys and if you have two or more aggregation for the same column, some internal loops may forget the non-uniqueness of the keys.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group