Hello. Could someone please explain why iteration over a Pyspark dataframe is way slower than over a Pandas dataframe?
Pyspark
df_list = df.collect()
for index in range(0, len(df_list )):
.....
Pandas
df_pnd = df.toPandas()
for index, row in df_pnd.iterrows():
....
Thank you in advance