cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Databricks Issue:- assertion failed: Invalid shuffle partition specs:

KumarShiv
New Contributor III

I hv a complex script which consuming more then 100GB data and have some aggregation on it and in the end I am simply try simply write/display data from Data frame. Then i am getting issue (assertion failed: Invalid shuffle partition specs: ).

Pls help me here, if any one have some idea.

DB_Issue

1 ACCEPTED SOLUTION

Accepted Solutions

Thanks Dudek, Finally I found the query where that have issue. As you suggested I debug step by step and run each every cell. And add the (spark.conf.set("spark.sql.shuffle.partitions",100)) at that cell. Its resolved ๐Ÿ˜€

View solution in original post

5 REPLIES 5

Hubert-Dudek
Esteemed Contributor III

It is hard to help in that case without seeing the whole code.

Added ".py" file in attachment, Pls have a look.

Hubert-Dudek
Esteemed Contributor III

Please use

display(df_FinalAction)

Spark is lazy evaluated but "display" not, so you can debug by displaying each dataframe at the end of each cell.

Thanks Dudek, Finally I found the query where that have issue. As you suggested I debug step by step and run each every cell. And add the (spark.conf.set("spark.sql.shuffle.partitions",100)) at that cell. Its resolved ๐Ÿ˜€

Hubert-Dudek
Esteemed Contributor III

Thanks, it is excellent. If you want you can select my answer as the best one ๐Ÿ™‚

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.