How to apply Pandas functions on PySpark DataFrame?
Hi, I want to apply Pandas functions (like isna, concat, append, etc) on PySpark DataFrame in such a way that computations are done on multi-node cluster.I don't want to convert PySpark DataFrame into Pandas DataFrame since, I think, only one node is...
- 2044 Views
- 2 replies
- 3 kudos
Latest Reply
The best is to use pandas on a spark, it is virtually interchangeable so it just different API for Spark data frameimport pyspark.pandas as ps psdf = ps.range(10) sdf = psdf.to_spark().filter("id > 5") sdf.show()
- 3 kudos