07-27-2022 12:41 AM
I hv a complex script which consuming more then 100GB data and have some aggregation on it and in the end I am simply try simply write/display data from Data frame. Then i am getting issue (assertion failed: Invalid shuffle partition specs: ).
Pls help me here, if any one have some idea.
07-28-2022 07:54 AM
Thanks Dudek, Finally I found the query where that have issue. As you suggested I debug step by step and run each every cell. And add the (spark.conf.set("spark.sql.shuffle.partitions",100)) at that cell. Its resolved 😀
07-27-2022 03:42 AM
It is hard to help in that case without seeing the whole code.
07-27-2022 04:02 AM
Added ".py" file in attachment, Pls have a look.
07-27-2022 06:10 AM
Please use
display(df_FinalAction)
Spark is lazy evaluated but "display" not, so you can debug by displaying each dataframe at the end of each cell.
07-28-2022 07:54 AM
Thanks Dudek, Finally I found the query where that have issue. As you suggested I debug step by step and run each every cell. And add the (spark.conf.set("spark.sql.shuffle.partitions",100)) at that cell. Its resolved 😀
07-29-2022 04:10 AM
Thanks, it is excellent. If you want you can select my answer as the best one 🙂
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now