05-05-2022 11:23 PM
Hi,
I am executing a simple job in Databricks for which I am getting below error. I increased the Driver size still I faced same issue.
Spark config :
from pyspark.sql import SparkSession
spark_session = SparkSession.builder.appName("Demand Forecasting").config("spark.yarn.executor.memoryOverhead", 2048).getOrCreate()
Driver and worker node type -r5.2xlarge
10 worker nodes.
Error Log:
Caused by: org.apache.spark.sql.execution.OutOfMemorySparkException: Size of broadcasted table far exceeds estimates and exceeds limit of spark.driver.maxResultSize=4294967296.
06-02-2022 01:51 AM
Hi @Kaniz Fatma ,
Switching the runtime version to 10.4 fixed the issue for me.
Thanks,
Chandan
05-05-2022 11:54 PM
looking at the error message you try to broadcast a large table. Remove the broadcast statement on the large table and you will be fine.
05-08-2022 12:05 PM
HI @Werner Stinckens ,
I am getting the above issue while writing a Spark DF as a parquet file to AWS S3. Not doing any broadcast join actually.
Thanks,
Chandan
05-06-2022 09:04 AM
In my opinion on databricks, you don't need to specify (spark_session = SparkSession.builder.appName("Demand Forecasting").config("spark.yarn.executor.memoryOverhead", 2048).getOrCreate()) and rest is as @Werner Stinckens said
05-08-2022 12:05 PM
I am getting the above issue while writing a Spark DF as a parquet file to AWS S3. Not doing any broadcast join actually.
05-11-2022 06:25 AM
As Hubert mentioned: you should not create a spark session on databricks, it is provided.
The fact you do not broadcast manually makes me think Spark uses a broadcastjoin.
There is a KB about issues with that:
https://kb.databricks.com/sql/bchashjoin-exceeds-bcjointhreshold-oom.html
Can you check if it is applicable?
05-13-2022 04:03 AM
Hi @Chandan Angadi, Just a friendly follow-up. Do you still need help, or @Hubert Dudek (Customer) and @Werner Stinckens's responses help you to find the solution? Please let us know.
06-02-2022 01:51 AM
Hi @Kaniz Fatma ,
Switching the runtime version to 10.4 fixed the issue for me.
Thanks,
Chandan
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.