Iterating over a pyspark.pandas.groupby.DataFrameG...

JacobKesinger · ‎03-12-2024

I have a pyspark.pandas.frame.DataFrame object (that I called from `pandas_api` on a pyspark.sql.dataframe.DataFrame object). I have a complicated transformation that I would like to apply to this data, and in particular I would like to apply it in blocks based on the value of a column 'C'.

If it had been a pandas.core.frame.DataFrame object, I could do:

for _,chunk in df.groupby("C"):

// do stuff

When I try this with a pyspark.pandas.frame.DataFrame object, I get `KeyError: (0,)`.

My question is: how do I get access to the grouped data in a pyspark.pandas.groupby.DataFrameGroupBy object? Is this possible at all, or am I only allowed to run aggregate functions?

Iterating over a pyspark.pandas.groupby.DataFrameGroupBy