Databricks Community

KumarShiv · ‎07-27-2022

I hv a complex script which consuming more then 100GB data and have some aggregation on it and in the end I am simply try simply write/display data from Data frame. Then i am getting issue (assertion failed: Invalid shuffle partition specs: ).

Pls help me here, if any one have some idea.

KumarShiv · ‎07-28-2022

Thanks Dudek, Finally I found the query where that have issue. As you suggested I debug step by step and run each every cell. And add the (spark.conf.set("spark.sql.shuffle.partitions",100)) at that cell. Its resolved 😀

View solution in original post

Hubert-Dudek · ‎07-27-2022

It is hard to help in that case without seeing the whole code.

KumarShiv · ‎07-27-2022

Added ".py" file in attachment, Pls have a look.

Hubert-Dudek · ‎07-27-2022

Please use

display(df_FinalAction)

Spark is lazy evaluated but "display" not, so you can debug by displaying each dataframe at the end of each cell.

KumarShiv · ‎07-28-2022

Thanks Dudek, Finally I found the query where that have issue. As you suggested I debug step by step and run each every cell. And add the (spark.conf.set("spark.sql.shuffle.partitions",100)) at that cell. Its resolved 😀