cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

What is the maximum limit of data that can be broadcasted using broadcast join

brickster_2018
Databricks Employee
Databricks Employee
 
1 ACCEPTED SOLUTION

Accepted Solutions

brickster_2018
Databricks Employee
Databricks Employee

By default, only 10 MB of data can be broadcasted.

spark.sql.autoBroadcastJoinThreshold can be increased up to 8GB

There is an upper limit in terms of records as well. We can't broadcast more than 512m records. So its either 512m records or 8GB which ever limit hits first

View solution in original post

2 REPLIES 2

brickster_2018
Databricks Employee
Databricks Employee

By default, only 10 MB of data can be broadcasted.

spark.sql.autoBroadcastJoinThreshold can be increased up to 8GB

There is an upper limit in terms of records as well. We can't broadcast more than 512m records. So its either 512m records or 8GB which ever limit hits first

lchari
New Contributor II

Is the limit per "table/dataframe" or for all tables/dataframes put together?

The driver collects the data from all executors (which are having the respective table or dataframe) and distributes to all executors. When will the memory be released in both driver and executor? Or does it hold on to this memory through out the pipeline/application?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group