โ12-12-2024 03:42 AM
Hello,
I am facing this error when moving a Workflow to serverless mode
ERROR : SparkException: Job aborted due to stage failure: Serialized task 482:0 was 269355219 bytes, which exceeds max allowed: spark.rpc.message.maxSize (268435456 bytes). Consider increasing spark.rpc.message.maxSize or using broadcast variables for large values.
on JOB cluster we could set the spark.rpc.message.maxSize manually to a value greater than 268 m, which looks not possible on Serverless
Any help is appreciated, thx
โ12-12-2024 04:51 AM
Hi @adurand-accure,
In serverless mode, you cannot directly modify the spark.rpc.message.maxSize parameter. To work around this limitation, you can consider the following approaches:
โ12-12-2024 06:41 AM
Hello Alberto,
Thanks, I already had this answer from the AI assistant and it didn't solved my problem, I am looking here for something different ๐
โ12-12-2024 12:27 PM
Hey @adurand-accure
Without details how your workflow works it can be hard to help. If the job fails on workflow part where you process large chunks of data then partitions or batches are probably your answer. Are u able to share some details?
โ12-12-2024 12:36 PM
Hello PiotrMi,
We found out that the problem was caused by a collect() and managed to fix it by changing some code
Thanks for your quick replies
Best regards,
Antoine
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group