Hi, it is quite normal that converting data frame from spark to pandas takes time.
Although there is a way we can optimize it.
Enable Arrow Optimization: Starting from Spark 3.0.0, We can enable arrow optimization, this will speed up the process by enabling the use of Apache Arrow for faster data transfer between Spark and Python.
spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
Harshit Kesharwani
Data engineer at Rsystema