Databricks Community

brickster_2018 · ‎06-25-2021

brickster_2018 · ‎06-25-2021

By default, only 10 MB of data can be broadcasted.

spark.sql.autoBroadcastJoinThreshold can be increased up to 8GB

There is an upper limit in terms of records as well. We can't broadcast more than 512m records. So its either 512m records or 8GB which ever limit hits first

View solution in original post

brickster_2018 · ‎06-25-2021

By default, only 10 MB of data can be broadcasted.

spark.sql.autoBroadcastJoinThreshold can be increased up to 8GB

There is an upper limit in terms of records as well. We can't broadcast more than 512m records. So its either 512m records or 8GB which ever limit hits first

lchari · ‎11-16-2024

Is the limit per "table/dataframe" or for all tables/dataframes put together?

The driver collects the data from all executors (which are having the respective table or dataframe) and distributes to all executors. When will the memory be released in both driver and executor? Or does it hold on to this memory through out the pipeline/application?

Databricks Community

What is the maximum limit of data that can be broadcasted using broadcast join

Photos

Join Us as a Local Community Builder!

Announcing the APJ Databricks Smart Business Insights Challenge: Empowering Data-Driven Decision Mak

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Business Intelligence in the Era of AI

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!