Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-19-2022 12:28 PM
to_pandas() is only for a small dataset.
Please use instead:
to_pandas_on_spark()It is essential to use Pandas on Spark instead of ordinary Pandas so that it will work in a distributed way. Here is more info https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/index.html
So always import Pandas as:
import pyspark.pandas as ps
My blog: https://databrickster.medium.com/