Iteration - Pyspark vs Pandas
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-21-2023 03:21 AM
Hello. Could someone please explain why iteration over a Pyspark dataframe is way slower than over a Pandas dataframe?
Pyspark
df_list = df.collect()
for index in range(0, len(df_list )):
.....
Pandas
df_pnd = df.toPandas()
for index, row in df_pnd.iterrows():
....
Thank you in advance
Labels:
- Labels:
-
Pandas
-
Pyspark Dataframe
1 REPLY 1
Anonymous
Not applicable
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-22-2023 12:11 AM
Hi @ELENI GEORGOUSI
Hope everything is going great.
Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you.
Cheers!