Broadcast Join Failure in Streaming: Failed to sto...

pooja_bhumandla · ‎11-04-2025

Hi Databricks Community,

I’m running a Structured Streaming job in Databricks with foreachBatch writing to a Delta table.

Failed to store executor broadcast spark_join_relation_1622863
(size = Some(67141632)) in BlockManager with storageLevel=StorageLevel(memory, deserialized, 1 replicas)

I understand that Spark may choose a broadcast join when one side of the join is small enough to be sent to all executors.

How exactly does Spark decide when to perform a broadcast join?
What are the recommended ways to handle or avoid broadcast join memory errors in streaming operations?

Any suggestions, configuration tips, or best practices would be greatly appreciated.

Thanks in advance!

Broadcast Join Failure in Streaming: Failed to store executor broadcast in BlockManager