cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to increase spark.kryoserializer.buffer.max

letsflykite
New Contributor II

when I join two dataframes, I got the following error.

org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 1 Serialization trace: values (org.apache.spark.sql.catalyst.expressions.GenericRow) otherElements (org.apache.spark.util.collection.CompactBuffer). To avoid this, increase spark.kryoserializer.buffer.max value. at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:253) at org.apache.spark.sql.execution.SparkSqlSerializer$$anonfun$serialize$1.apply(SparkSqlSerializer.scala:90) at org.apache.spark.sql.execution.SparkSqlSerializer$$anonfun$serialize$1.apply(SparkSqlSerializer.scala:89) at org.apache.spark.sql.execution.SparkSqlSerializer$.acquireRelease(SparkSqlSerializer.scala:82) at org.apache.spark.sql.execution.SparkSqlSerializer$.serialize(SparkSqlSerializer.scala:89) at org.apache.spark.sql.execution.joins.GeneralHashedRelation.writeExternal(HashedRelation.scala:65) at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1458) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1429) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44) at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:203) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:102) at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:85) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1289)

So how to increase spark.kryoserializer.buffer.max in databricks cloud? http://spark.apache.org/docs/latest/configuration.html does not teach a way for databricks cloud.

2 REPLIES 2

arsalan1
Contributor

@letsflykite If you go to Databricks Guide -> Spark -> Configuring Spark you'll see a guide on how to change some of the Spark configuration settings using init scripts. On the near term roadmap will also be the ability to do these through the UI in an easier fashion.

One word of caution - it should be fairly rare to need to change these settings. Typically it means there is something in the code that is not performing as expected and leading to the error.

Jose_Maria_Tala
New Contributor II

val conf = new SparkConf()

...

conf.set("spark.kryoserializer.buffer.max.mb", "512")

...

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group