07-27-2022 12:41 AM
I hv a complex script which consuming more then 100GB data and have some aggregation on it and in the end I am simply try simply write/display data from Data frame. Then i am getting issue (assertion failed: Invalid shuffle partition specs: ).
Pls help me here, if any one have some idea.
07-28-2022 07:54 AM
Thanks Dudek, Finally I found the query where that have issue. As you suggested I debug step by step and run each every cell. And add the (spark.conf.set("spark.sql.shuffle.partitions",100)) at that cell. Its resolved 😀
07-27-2022 03:42 AM
It is hard to help in that case without seeing the whole code.
07-27-2022 04:02 AM
Added ".py" file in attachment, Pls have a look.
07-27-2022 06:10 AM
Please use
display(df_FinalAction)
Spark is lazy evaluated but "display" not, so you can debug by displaying each dataframe at the end of each cell.
07-28-2022 07:54 AM
Thanks Dudek, Finally I found the query where that have issue. As you suggested I debug step by step and run each every cell. And add the (spark.conf.set("spark.sql.shuffle.partitions",100)) at that cell. Its resolved 😀
07-29-2022 04:10 AM
Thanks, it is excellent. If you want you can select my answer as the best one 🙂
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group