Hubert-Dudek
Databricks MVP

to_pandas() is only for a small dataset.

Please use instead:

to_pandas_on_spark()

It is essential to use Pandas on Spark instead of ordinary Pandas so that it will work in a distributed way. Here is more info https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/index.html

So always import Pandas as:

import pyspark.pandas as ps


My blog: https://databrickster.medium.com/