<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How do I handle a task not serializable exception? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-do-i-handle-a-task-not-serializable-exception/m-p/30705#M22289</link>
    <description>&lt;P&gt;@Nick Studenski​&amp;nbsp;, Can you try declaring the un and pw variables outside the scope of for each partition? Do it before, so that way you are just passing a variable into that function rather than the dbutils object.&lt;/P&gt;</description>
    <pubDate>Wed, 29 Sep 2021 20:15:54 GMT</pubDate>
    <dc:creator>vida</dc:creator>
    <dc:date>2021-09-29T20:15:54Z</dc:date>
    <item>
      <title>How do I handle a task not serializable exception?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-do-i-handle-a-task-not-serializable-exception/m-p/30698#M22282</link>
      <description />
      <pubDate>Sun, 08 Mar 2015 23:13:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-do-i-handle-a-task-not-serializable-exception/m-p/30698#M22282</guid>
      <dc:creator>cfregly</dc:creator>
      <dc:date>2015-03-08T23:13:07Z</dc:date>
    </item>
    <item>
      <title>Re: How do I handle a task not serializable exception?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-do-i-handle-a-task-not-serializable-exception/m-p/30699#M22283</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;P&gt;If you see this error:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: ...
&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;The above error can be triggered when you intialize a variable on the driver (master), but then try to use it on one of the workers. In that case, Spark Streaming will try to serialize the object to send it over to the worker, and fail if the object is not serializable. Consider the following code snippet:&lt;/P&gt;NotSerializable notSerializable = new NotSerializable();
JavaRDD&amp;lt;String&amp;gt; rdd = sc.textFile("/tmp/myfile");
&lt;P&gt;rdd.map(s -&amp;gt; notSerializable.doSomething(s)).collect();
&lt;/P&gt;&lt;P&gt;This will trigger that error. Here are some ideas to fix this error:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Make the class Serializable&lt;/LI&gt;&lt;LI&gt;Declare the instance only within the lambda function passed in map.&lt;/LI&gt;&lt;LI&gt;Make the NotSerializable object as a static and create it once per machine.&lt;/LI&gt;&lt;LI&gt;Call rdd.forEachPartition and create the NotSerializable object in there like this:&lt;/LI&gt;&lt;/UL&gt;rdd.forEachPartition(iter -&amp;gt; {
  NotSerializable notSerializable = new NotSerializable();
&lt;P&gt;  // ...Now process iter
});&lt;/P&gt;</description>
      <pubDate>Sun, 08 Mar 2015 23:13:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-do-i-handle-a-task-not-serializable-exception/m-p/30699#M22283</guid>
      <dc:creator>cfregly</dc:creator>
      <dc:date>2015-03-08T23:13:13Z</dc:date>
    </item>
    <item>
      <title>Re: How do I handle a task not serializable exception?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-do-i-handle-a-task-not-serializable-exception/m-p/30700#M22284</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Also note that in Databricks Cloud, variables in a cell may be broadcast so they can be accessed from the worker nodes. If you don't need to use that variable ever in a transformation on a worker node, another fix is to declare the variable @TransformersPorryient in a scala notebook:&lt;/P&gt;
&lt;P&gt;@transient val myNonSerializableObjectThatIDoNotUseInATransformation = ....&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 22 May 2015 22:55:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-do-i-handle-a-task-not-serializable-exception/m-p/30700#M22284</guid>
      <dc:creator>vida</dc:creator>
      <dc:date>2015-05-22T22:55:38Z</dc:date>
    </item>
    <item>
      <title>Re: How do I handle a task not serializable exception?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-do-i-handle-a-task-not-serializable-exception/m-p/30701#M22285</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;UL&gt;&lt;/UL&gt;
&lt;P&gt;I cannot make the class serializable, and I don't want to create the instance in the lambda function again and again. So,&lt;/P&gt;
&lt;P&gt;1. How to make the NotSerializable object as a static and create it once per machine?&lt;/P&gt;
&lt;P&gt;2. If calling rdd.forEachPartition, how can I have return values?&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 31 Oct 2015 17:45:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-do-i-handle-a-task-not-serializable-exception/m-p/30701#M22285</guid>
      <dc:creator>enjoyear</dc:creator>
      <dc:date>2015-10-31T17:45:18Z</dc:date>
    </item>
    <item>
      <title>Re: How do I handle a task not serializable exception?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-do-i-handle-a-task-not-serializable-exception/m-p/30702#M22286</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;You can use the Singleton Patter to create an object once per machine. This is explained very well on wikipedia:&lt;/P&gt;
&lt;P&gt;&lt;A href="https://en.wikipedia.org/wiki/Singleton_pattern" target="test_blank"&gt;https://en.wikipedia.org/wiki/Singleton_pattern&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;If you want to return values, you can use the mapPartitions transformation instead of the forEachPartition action.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 02 Nov 2015 19:14:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-do-i-handle-a-task-not-serializable-exception/m-p/30702#M22286</guid>
      <dc:creator>vida</dc:creator>
      <dc:date>2015-11-02T19:14:37Z</dc:date>
    </item>
    <item>
      <title>Re: How do I handle a task not serializable exception?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-do-i-handle-a-task-not-serializable-exception/m-p/30703#M22287</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;P&gt;said differently/functionally, &lt;PRE&gt;&lt;CODE&gt;mapPartitions()&lt;/CODE&gt;&lt;/PRE&gt; returns a value and does not have side effects .  &lt;PRE&gt;&lt;CODE&gt;forEachPartition()&lt;/CODE&gt;&lt;/PRE&gt; does not return a value, but (typically) does have side effects.&lt;/P&gt;</description>
      <pubDate>Tue, 03 Nov 2015 22:19:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-do-i-handle-a-task-not-serializable-exception/m-p/30703#M22287</guid>
      <dc:creator>cfregly</dc:creator>
      <dc:date>2015-11-03T22:19:02Z</dc:date>
    </item>
    <item>
      <title>Re: How do I handle a task not serializable exception?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-do-i-handle-a-task-not-serializable-exception/m-p/30704#M22288</link>
      <description>&lt;P&gt;@cfregly​&amp;nbsp; @Vida Ha​&amp;nbsp;&lt;/P&gt;&lt;P&gt; I'm having trouble with the same "task not serializable" error when calling foreachPartition.&lt;/P&gt;&lt;P&gt;My code looks like: &lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;myDF
.foreachPartition { (rddpartition: Iterator[Row]) =&amp;gt;
  val url = "jdbc:sqlserver://&amp;lt;myurl&amp;gt;"
  val un = dbutils.secrets.get("my-secret-scope", "my-username")
  val pw = dbutils.secrets.get("my-secret-scope", "my-password")
  val connection = DriverManager.getConnection(url, un, pw)
  var statement = connection.createStatement()
  rddpartition.foreach { (row: Row) =&amp;gt;
    statement.addBatch("INSERT INTO dbo.Table(Field1, Field2) VALUES (${row.get(0)}, ${row.get(1)})")
  }
  statement.executeBatch()
  connection.close()
}&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;The above code results in the error "org.apache.spark.SparkException: Task not serializable"&lt;/P&gt;&lt;P&gt;When I modify the code to use the username and password as strings instead as shown below, it works just fine. Are you aware of a way to get around the serializability of dbutils.secrets.get?&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;DriverManager.getConnection(url, "&amp;lt;username string&amp;gt;", "&amp;lt;password string&amp;gt;")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 10 May 2019 19:35:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-do-i-handle-a-task-not-serializable-exception/m-p/30704#M22288</guid>
      <dc:creator>NickStudenski</dc:creator>
      <dc:date>2019-05-10T19:35:52Z</dc:date>
    </item>
    <item>
      <title>Re: How do I handle a task not serializable exception?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-do-i-handle-a-task-not-serializable-exception/m-p/30705#M22289</link>
      <description>&lt;P&gt;@Nick Studenski​&amp;nbsp;, Can you try declaring the un and pw variables outside the scope of for each partition? Do it before, so that way you are just passing a variable into that function rather than the dbutils object.&lt;/P&gt;</description>
      <pubDate>Wed, 29 Sep 2021 20:15:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-do-i-handle-a-task-not-serializable-exception/m-p/30705#M22289</guid>
      <dc:creator>vida</dc:creator>
      <dc:date>2021-09-29T20:15:54Z</dc:date>
    </item>
    <item>
      <title>Re: How do I handle a task not serializable exception?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-do-i-handle-a-task-not-serializable-exception/m-p/30707#M22291</link>
      <description>&lt;P&gt;Hi @Nick Studenski​&amp;nbsp;, Could you share, how you solved your problem ?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 02 Jun 2022 04:03:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-do-i-handle-a-task-not-serializable-exception/m-p/30707#M22291</guid>
      <dc:creator>RajatS</dc:creator>
      <dc:date>2022-06-02T04:03:43Z</dc:date>
    </item>
  </channel>
</rss>

