topic Re: org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 69 tasks (4.0 GB) is bigger than spark.driver.maxResultSize (4.0 GB) in Data Engineering

org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 69 tasks (4.0 GB) is bigger than spark.driver.maxResultSize (4.0 GB)

sachinmkp1 — Mon, 23 Aug 2021 14:48:52 GMT

set spark.conf.set("spark.driver.maxResultSize", "20g")

get spark.conf.get("spark.driver.maxResultSize") // 20g which is expected in notebook , I did not do in cluster level setting

still getting 4g while executing the spark job , why?

because of this job is getting failed.

Re: org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 69 tasks (4.0 GB) is bigger than spark.driver.maxResultSize (4.0 GB)

sachinmkp1 — Mon, 23 Aug 2021 14:51:54 GMT

question is- when I go to set spark.driver.maxResultSize = 20g in notebook only , it is not taking while executing the job even when I try to get the spark.driver.maxResultSize value in notebook I am getting 20g.

Still need clarification why does it behave like this?

Re: org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 69 tasks (4.0 GB) is bigger than spark.driver.maxResultSize (4.0 GB)

jose_gonzalez — Thu, 16 Sep 2021 17:24:14 GMT

Hi @sachinmkp1@gmail.com ,

You need to add this Spark configuration at your cluster level, not at the notebook level. When you add it to the cluster level it will apply the settings properly. For more details on this issue, please check our knowledge base article https://kb.databricks.com/jobs/job-fails-maxresultsize-exception.html

Thank you.