cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Databricks Issue:- assertion failed: Invalid shuffle partition specs:

KumarShiv
New Contributor III

I hv a complex script which consuming more then 100GB data and have some aggregation on it and in the end I am simply try simply write/display data from Data frame. Then i am getting issue (assertion failed: Invalid shuffle partition specs: ).

Pls help me here, if any one have some idea.

DB_Issue

1 ACCEPTED SOLUTION

Accepted Solutions

Thanks Dudek, Finally I found the query where that have issue. As you suggested I debug step by step and run each every cell. And add the (spark.conf.set("spark.sql.shuffle.partitions",100)) at that cell. Its resolved ๐Ÿ˜€

View solution in original post

5 REPLIES 5

Hubert-Dudek
Esteemed Contributor III

It is hard to help in that case without seeing the whole code.

Added ".py" file in attachment, Pls have a look.

Hubert-Dudek
Esteemed Contributor III

Please use

display(df_FinalAction)

Spark is lazy evaluated but "display" not, so you can debug by displaying each dataframe at the end of each cell.

Thanks Dudek, Finally I found the query where that have issue. As you suggested I debug step by step and run each every cell. And add the (spark.conf.set("spark.sql.shuffle.partitions",100)) at that cell. Its resolved ๐Ÿ˜€

Hubert-Dudek
Esteemed Contributor III

Thanks, it is excellent. If you want you can select my answer as the best one ๐Ÿ™‚

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now