<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to increase spark.kryoserializer.buffer.max in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-increase-spark-kryoserializer-buffer-max/m-p/30306#M21953</link>
    <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;val conf = new SparkConf()&lt;/P&gt;
&lt;P&gt;...&lt;/P&gt;
&lt;P&gt;conf.set("spark.kryoserializer.buffer.max.mb", "512")&lt;/P&gt;
&lt;P&gt;...&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 03 Aug 2017 13:34:42 GMT</pubDate>
    <dc:creator>Jose_Maria_Tala</dc:creator>
    <dc:date>2017-08-03T13:34:42Z</dc:date>
    <item>
      <title>How to increase spark.kryoserializer.buffer.max</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-increase-spark-kryoserializer-buffer-max/m-p/30304#M21951</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;when I join two dataframes, I got the following error.&lt;/P&gt;
&lt;P&gt;org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 1 Serialization trace: values (org.apache.spark.sql.catalyst.expressions.GenericRow) otherElements (org.apache.spark.util.collection.CompactBuffer). To avoid this, increase spark.kryoserializer.buffer.max value. at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:253) at org.apache.spark.sql.execution.SparkSqlSerializer$$anonfun$serialize$1.apply(SparkSqlSerializer.scala:90) at org.apache.spark.sql.execution.SparkSqlSerializer$$anonfun$serialize$1.apply(SparkSqlSerializer.scala:89) at org.apache.spark.sql.execution.SparkSqlSerializer$.acquireRelease(SparkSqlSerializer.scala:82) at org.apache.spark.sql.execution.SparkSqlSerializer$.serialize(SparkSqlSerializer.scala:89) at org.apache.spark.sql.execution.joins.GeneralHashedRelation.writeExternal(HashedRelation.scala:65) at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1458) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1429) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44) at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:203) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:102) at org.apache.spark.broadcast.TorrentBroadcast.&amp;lt;init&amp;gt;(TorrentBroadcast.scala:85) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1289)&lt;/P&gt;
&lt;P&gt;So how to increase spark.kryoserializer.buffer.max in databricks cloud? &lt;A href="http://spark.apache.org/docs/latest/configuration.html" target="test_blank"&gt;http://spark.apache.org/docs/latest/configuration.html&lt;/A&gt; does not teach a way for databricks cloud.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 01 Aug 2015 05:25:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-increase-spark-kryoserializer-buffer-max/m-p/30304#M21951</guid>
      <dc:creator>letsflykite</dc:creator>
      <dc:date>2015-08-01T05:25:03Z</dc:date>
    </item>
    <item>
      <title>Re: How to increase spark.kryoserializer.buffer.max</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-increase-spark-kryoserializer-buffer-max/m-p/30305#M21952</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;@letsflykite If you go to Databricks Guide -&amp;gt; Spark -&amp;gt; Configuring Spark you'll see a guide on how to change some of the Spark configuration settings using init scripts. On the near term roadmap will also be the ability to do these through the UI in an easier fashion.&lt;/P&gt;
&lt;P&gt;One word of caution - it should be fairly rare to need to change these settings. Typically it means there is something in the code that is not performing as expected and leading to the error.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 07 Aug 2015 17:01:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-increase-spark-kryoserializer-buffer-max/m-p/30305#M21952</guid>
      <dc:creator>arsalan1</dc:creator>
      <dc:date>2015-08-07T17:01:46Z</dc:date>
    </item>
    <item>
      <title>Re: How to increase spark.kryoserializer.buffer.max</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-increase-spark-kryoserializer-buffer-max/m-p/30306#M21953</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;val conf = new SparkConf()&lt;/P&gt;
&lt;P&gt;...&lt;/P&gt;
&lt;P&gt;conf.set("spark.kryoserializer.buffer.max.mb", "512")&lt;/P&gt;
&lt;P&gt;...&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 03 Aug 2017 13:34:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-increase-spark-kryoserializer-buffer-max/m-p/30306#M21953</guid>
      <dc:creator>Jose_Maria_Tala</dc:creator>
      <dc:date>2017-08-03T13:34:42Z</dc:date>
    </item>
  </channel>
</rss>

