Issue with Pyspark GroupBy GroupedData

Harun · ‎03-22-2023

Hi Guys,

I am working on streaming data movement from bronze to silver. My bronze table is having a entity_name column, based on the entity_name column i need to create multiple silver tables.

I tried the below approach, But it is failing with error "'GroupedData' object has no attribute 'get_group'"

Sample Code Snippet :

grouped_df = bronze_df.groupBy("entity_name")

entity_names = [row.PrimaryEntityName for row in grouped_df.agg({"entity_name": "first"}).collect()]

for entity_name in entity_names:

entity_df = grouped_df.get_group(entity_name)

I think where/filter clause can do the needful but efficiency wise it wont be a good solution in my pov. Is there anyother approach on doing this?

TIA.