<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Job fails after runtime upgrade in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/job-fails-after-runtime-upgrade/m-p/14193#M8726</link>
    <description>&lt;P&gt;I have a job running with no issues in Databricks runtime 7.3 LTS. When I upgraded to 8.3 it fails with error &lt;B&gt;An exception was thrown from a UDF: 'pyspark.serializers.SerializationError'... SparkContext should only be created and accessed on the driver&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In the notebook I use applyInPandas to apply a UDF to each group. In this UDF I pull data from Snowflake making use of the spark session (spark.read.format(...)) and I understand that is the reason why it fails.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;My question is,&lt;B&gt; &lt;/B&gt;why was it working in 7.3 LTS and it's not working now? What changed?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 30 Sep 2021 10:54:36 GMT</pubDate>
    <dc:creator>NicolasEscobar</dc:creator>
    <dc:date>2021-09-30T10:54:36Z</dc:date>
    <item>
      <title>Job fails after runtime upgrade</title>
      <link>https://community.databricks.com/t5/data-engineering/job-fails-after-runtime-upgrade/m-p/14193#M8726</link>
      <description>&lt;P&gt;I have a job running with no issues in Databricks runtime 7.3 LTS. When I upgraded to 8.3 it fails with error &lt;B&gt;An exception was thrown from a UDF: 'pyspark.serializers.SerializationError'... SparkContext should only be created and accessed on the driver&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In the notebook I use applyInPandas to apply a UDF to each group. In this UDF I pull data from Snowflake making use of the spark session (spark.read.format(...)) and I understand that is the reason why it fails.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;My question is,&lt;B&gt; &lt;/B&gt;why was it working in 7.3 LTS and it's not working now? What changed?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 30 Sep 2021 10:54:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/job-fails-after-runtime-upgrade/m-p/14193#M8726</guid>
      <dc:creator>NicolasEscobar</dc:creator>
      <dc:date>2021-09-30T10:54:36Z</dc:date>
    </item>
    <item>
      <title>Re: Job fails after runtime upgrade</title>
      <link>https://community.databricks.com/t5/data-engineering/job-fails-after-runtime-upgrade/m-p/14195#M8728</link>
      <description>&lt;P&gt;@Nicolas Escobar​&amp;nbsp;- could you please share the full error stack trace ?&lt;/P&gt;</description>
      <pubDate>Tue, 05 Oct 2021 17:38:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/job-fails-after-runtime-upgrade/m-p/14195#M8728</guid>
      <dc:creator>shan_chandra</dc:creator>
      <dc:date>2021-10-05T17:38:45Z</dc:date>
    </item>
    <item>
      <title>Re: Job fails after runtime upgrade</title>
      <link>https://community.databricks.com/t5/data-engineering/job-fails-after-runtime-upgrade/m-p/14196#M8729</link>
      <description>&lt;P&gt;DBR-8.3 uses  SPARK with version 3.1.x. As per &lt;A href="https://spark.apache.org/docs/latest/core-migration-guide.html#upgrading-from-core-30-to-31" alt="https://spark.apache.org/docs/latest/core-migration-guide.html#upgrading-from-core-30-to-31" target="_blank"&gt;migration guide&lt;/A&gt; by default it is restricted to use SparkContext inside the executor. You can enable it by using &lt;B&gt;spark.executor.allowSparkContext&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In Spark 3.0 and below, SparkContext can be created in executors. Since Spark 3.1, an exception will be thrown when creating SparkContext in executors. You can allow it by setting the configuration spark.executor.allowSparkContext when creating SparkContext in executors.&lt;/P&gt;</description>
      <pubDate>Wed, 06 Oct 2021 08:18:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/job-fails-after-runtime-upgrade/m-p/14196#M8729</guid>
      <dc:creator>User16763506586</dc:creator>
      <dc:date>2021-10-06T08:18:00Z</dc:date>
    </item>
    <item>
      <title>Re: Job fails after runtime upgrade</title>
      <link>https://community.databricks.com/t5/data-engineering/job-fails-after-runtime-upgrade/m-p/14197#M8730</link>
      <description>&lt;P&gt;To clarify a bit more - in Spark, you can never use a SparkContext or SparkSession within a task / UDF. This has always been true. If it worked before, it's because you were accidentally sending the SparkContext because it was captured in your code, but I guess you never tried to use it. It would have failed. Now it just fails earlier.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The real solution is to change your code to not accidentally hold on to the SparkContext or SparkSession in your UDF.&lt;/P&gt;</description>
      <pubDate>Sun, 10 Oct 2021 16:24:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/job-fails-after-runtime-upgrade/m-p/14197#M8730</guid>
      <dc:creator>sean_owen</dc:creator>
      <dc:date>2021-10-10T16:24:36Z</dc:date>
    </item>
    <item>
      <title>Re: Job fails after runtime upgrade</title>
      <link>https://community.databricks.com/t5/data-engineering/job-fails-after-runtime-upgrade/m-p/14198#M8731</link>
      <description>&lt;P&gt;Adding to @Sean Owen​&amp;nbsp; comments, The only reason this is working is that the optimizer is evaluating this locally rather than creating a context on executors and evaluating it. &lt;/P&gt;</description>
      <pubDate>Tue, 01 Mar 2022 11:33:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/job-fails-after-runtime-upgrade/m-p/14198#M8731</guid>
      <dc:creator>User16873042682</dc:creator>
      <dc:date>2022-03-01T11:33:43Z</dc:date>
    </item>
    <item>
      <title>Re: Job fails after runtime upgrade</title>
      <link>https://community.databricks.com/t5/data-engineering/job-fails-after-runtime-upgrade/m-p/14199#M8732</link>
      <description>&lt;P&gt;Thanks Sean for your answer, it's clear.&lt;/P&gt;&lt;P&gt;I was just wondering why the code was executing before with no errors and with the expected output but now I understand that this is because there was no restriction before and this changed after the release of Spark 3.1, as Sandeep mentioned.&lt;/P&gt;</description>
      <pubDate>Mon, 07 Mar 2022 11:59:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/job-fails-after-runtime-upgrade/m-p/14199#M8732</guid>
      <dc:creator>NicolasEscobar</dc:creator>
      <dc:date>2022-03-07T11:59:35Z</dc:date>
    </item>
    <item>
      <title>Re: Job fails after runtime upgrade</title>
      <link>https://community.databricks.com/t5/data-engineering/job-fails-after-runtime-upgrade/m-p/14200#M8733</link>
      <description>&lt;P&gt;Hi @Sean Owen​&amp;nbsp;Thanks for highlighting this. Could you please provide some sample code when you mention "not accidentally hold on to the SparkContext or SparkSession in your UDF". Thanks&lt;/P&gt;</description>
      <pubDate>Wed, 06 Jul 2022 14:34:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/job-fails-after-runtime-upgrade/m-p/14200#M8733</guid>
      <dc:creator>Santhosh_Holla</dc:creator>
      <dc:date>2022-07-06T14:34:56Z</dc:date>
    </item>
    <item>
      <title>Re: Job fails after runtime upgrade</title>
      <link>https://community.databricks.com/t5/data-engineering/job-fails-after-runtime-upgrade/m-p/14201#M8734</link>
      <description>&lt;P&gt;There are 1000 ways this could happen, so not really, but they're all the same idea: you can't reference the SparkContext or SparkSession object, directly or indirectly in a UDF. Simply, you cannot use it in the UDF code.&lt;/P&gt;</description>
      <pubDate>Wed, 06 Jul 2022 14:39:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/job-fails-after-runtime-upgrade/m-p/14201#M8734</guid>
      <dc:creator>sean_owen</dc:creator>
      <dc:date>2022-07-06T14:39:18Z</dc:date>
    </item>
  </channel>
</rss>

