Hello Everyone,
I am facing the challenge while collecting a spark dataframe into an R dataframe, this I need to do as I am using TraMineR algorithm whih is implemented in R only and the data pre-processing I have done in pyspark
I am trying this:
events_df <- collect(events)
events_grp_df <- collect(events_grp)
The error that is occuring is related to Kyro serialization
" org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 58. To avoid this, increase spark.kryoserializer.buffer.max value"
Can anyone help to suggest any alternate to collect or any other way to solve this problem?
FYI : I tried to increase the buffer.max.mb using spark.conf.set("spark.kryoserializer.buffer.max.mb", "50000") but it is not working
Thanks in advance