cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Pandas API on Spark creates huge query plans

varshanagarajan
New Contributor

Hello,

I have a piece of code written in Pyspark and Pandas API on Spark. On comparing the query plans, I see Pandas API on Spark creates huge query plans whereas Pyspark plan is a tiny one. Furthermore, with Pandas API on spark, we see a lot of inconsistencies in the generated results. Pyspark code executes in 4 mins and Pandas API on Spark takes 16 mins. Do we have reasons or documented issues for these? Would like to know why this is happening so that I can address the problem

2 REPLIES 2

FRB1984
New Contributor II

Did you find an answer? 

Iโ€™ve noticed a similar situation where a simple pyspark.pandas query if far more complex and slower than a pyspark sql. 

BS_THE_ANALYST
Esteemed Contributor

@FRB1984 could you provide some examples? I'm curious. My first thoughts would be around the shuffling. Check this out: https://spark.apache.org/docs/3.5.4/api/python/user_guide/pandas_on_spark/best_practices.html . There's an argument to be made about how the code is being written. That'll play into the execution plan, naturally.

BS_THE_ANALYST_0-1755350591093.png

There's some other best practices worth noting on that documentation page:

BS_THE_ANALYST_1-1755350686312.png

@FRB1984  have you looked into this: 

BS_THE_ANALYST_2-1755351029710.png

The index type will likely have a contribution and consequence to the execution plan being larger. You can alter this setting in the options: https://spark.apache.org/docs/latest/api/python/tutorial/pandas_on_spark/options.html 

Let me know how you get on.

All the best,
BS

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now