- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-12-2024 12:05 PM
I have a pyspark.pandas.frame.DataFrame object (that I called from `pandas_api` on a pyspark.sql.dataframe.DataFrame object). I have a complicated transformation that I would like to apply to this data, and in particular I would like to apply it in blocks based on the value of a column 'C'.
If it had been a pandas.core.frame.DataFrame object, I could do:
for _,chunk in df.groupby("C"):
// do stuff
When I try this with a pyspark.pandas.frame.DataFrame object, I get `KeyError: (0,)`.
My question is: how do I get access to the grouped data in a pyspark.pandas.groupby.DataFrameGroupBy object? Is this possible at all, or am I only allowed to run aggregate functions?