cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Getting 'Multiple failure in stage materialization' error in one of my Job with notebook task

sandy_123
New Contributor II
Multiple failures in stage materialization. it tried using powerful Job cluster but it does not work out?? Any suggestions how should i fix it?
 
FYI- my dataframe(uniq_rec_df) has around 30M rows

 

-------Screenshot attached---

sandy_123_0-1768844333336.png

 

 


1 REPLY 1

Saritha_S
Databricks Employee
Databricks Employee

Hi @sandy_123 

The "Multiple failures in stage materialization" error at line 120 is caused by a massive shuffle bottleneck

Check the spark UI and try to understand the reason for the failure such as RPC, heartbeat error etc

Primary Issues:

  1. Window Function Computed Multiple Times (Lines 26-117)
  • The window function computing geo_routing_flag using partition by concat(...) forces Spark to shuffle ALL data by visitor ID
  • This happens separately for each data source, causing multiple expensive shuffles
  • If you have skewed visitor IDs (some with thousands of records), certain partitions become extremely large
  1. Memory Pressure from Chain of Shuffles
  • By the time you reach the final write at line 120, accumulated shuffle operations have created memory pressure
  • Straggler tasks (some partitions processing 10-100x more data) cause the job to appear "stuck"
  1. Write Operation Triggering Final Shuffle
  • The .mode("overwrite") operation is the final straw that triggers one more massive shuffle
  • If using dynamic partitioning, this multiplies the cost further

To overcome the issue try the cache the frequently used df and see if you can tune the size of the executor by adding any config or see if increasing the executor helps in resolving the issue.