cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Spark Driver Out of Memory Issue

chandan_a_v
Valued Contributor

Hi,

I am executing a simple job in Databricks for which I am getting below error. I increased the Driver size still I faced same issue.

Spark config :

from pyspark.sql import SparkSession

spark_session = SparkSession.builder.appName("Demand Forecasting").config("spark.yarn.executor.memoryOverhead", 2048).getOrCreate()

Driver and worker node type -r5.2xlarge

10 worker nodes.

Error Log:

Caused by: org.apache.spark.sql.execution.OutOfMemorySparkException: Size of broadcasted table far exceeds estimates and exceeds limit of spark.driver.maxResultSize=4294967296.

1 ACCEPTED SOLUTION

Accepted Solutions

chandan_a_v
Valued Contributor

Hi @Kaniz Fatma​ ,

Switching the runtime version to 10.4 fixed the issue for me.

Thanks,

Chandan

View solution in original post

7 REPLIES 7

-werners-
Esteemed Contributor III

looking at the error message you try to broadcast a large table. Remove the broadcast statement on the large table and you will be fine.

HI @Werner Stinckens​ ,

I am getting the above issue while writing a Spark DF as a parquet file to AWS S3. Not doing any broadcast join actually.

Thanks,

Chandan

Hubert-Dudek
Esteemed Contributor III

In my opinion on databricks, you don't need to specify (spark_session = SparkSession.builder.appName("Demand Forecasting").config("spark.yarn.executor.memoryOverhead", 2048).getOrCreate()) and rest is as @Werner Stinckens​ said

chandan_a_v
Valued Contributor

I am getting the above issue while writing a Spark DF as a parquet file to AWS S3. Not doing any broadcast join actually.

-werners-
Esteemed Contributor III

As Hubert mentioned: you should not create a spark session on databricks, it is provided.

The fact you do not broadcast manually makes me think Spark uses a broadcastjoin.

There is a KB about issues with that:

https://kb.databricks.com/sql/bchashjoin-exceeds-bcjointhreshold-oom.html

Can you check if it is applicable?

Kaniz
Community Manager
Community Manager

Hi @Chandan Angadi​, Just a friendly follow-up. Do you still need help, or @Hubert Dudek (Customer)​ and @Werner Stinckens​'s responses help you to find the solution? Please let us know.

chandan_a_v
Valued Contributor

Hi @Kaniz Fatma​ ,

Switching the runtime version to 10.4 fixed the issue for me.

Thanks,

Chandan

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.