cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

What is the maximum limit of data that can be broadcasted using broadcast join

brickster_2018
Databricks Employee
Databricks Employee
 
1 ACCEPTED SOLUTION

Accepted Solutions

brickster_2018
Databricks Employee
Databricks Employee

By default, only 10 MB of data can be broadcasted.

spark.sql.autoBroadcastJoinThreshold can be increased up to 8GB

There is an upper limit in terms of records as well. We can't broadcast more than 512m records. So its either 512m records or 8GB which ever limit hits first

View solution in original post

2 REPLIES 2

brickster_2018
Databricks Employee
Databricks Employee

By default, only 10 MB of data can be broadcasted.

spark.sql.autoBroadcastJoinThreshold can be increased up to 8GB

There is an upper limit in terms of records as well. We can't broadcast more than 512m records. So its either 512m records or 8GB which ever limit hits first

lchari
New Contributor II

Is the limit per "table/dataframe" or for all tables/dataframes put together?

The driver collects the data from all executors (which are having the respective table or dataframe) and distributes to all executors. When will the memory be released in both driver and executor? Or does it hold on to this memory through out the pipeline/application?

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now