- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-25-2021 02:51 PM
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-25-2021 02:53 PM
By default, only 10 MB of data can be broadcasted.
spark.sql.autoBroadcastJoinThreshold can be increased up to 8GB
There is an upper limit in terms of records as well. We can't broadcast more than 512m records. So its either 512m records or 8GB which ever limit hits first
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-25-2021 02:53 PM
By default, only 10 MB of data can be broadcasted.
spark.sql.autoBroadcastJoinThreshold can be increased up to 8GB
There is an upper limit in terms of records as well. We can't broadcast more than 512m records. So its either 512m records or 8GB which ever limit hits first
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-16-2024 06:01 AM
Is the limit per "table/dataframe" or for all tables/dataframes put together?
The driver collects the data from all executors (which are having the respective table or dataframe) and distributes to all executors. When will the memory be released in both driver and executor? Or does it hold on to this memory through out the pipeline/application?

