spark.apache.org

Dan_Z
Databricks Employee
Databricks Employee

mapInPandas is one of the most powerful Spark functions. It uses an arrow-like in-memory data structure to split up Spark Data Frames into chunks and feeding them to a function that takes a Pandas DF as input and output. Check it out here:

https://spark.apache.org/docs/3.0.0/sql-pyspark-pandas-with-arrow.html#map