<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Re-establish SparkSession using Databricks connect after cluster restart in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/re-establish-sparksession-using-databricks-connect-after-cluster/m-p/69460#M33895</link>
    <description>&lt;P&gt;If anyone encounters this problem, the solution that worked for me was to restart the Jupyter kernel.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sat, 18 May 2024 06:59:02 GMT</pubDate>
    <dc:creator>Michael_Chein</dc:creator>
    <dc:date>2024-05-18T06:59:02Z</dc:date>
    <item>
      <title>Re-establish SparkSession using Databricks connect after cluster restart</title>
      <link>https://community.databricks.com/t5/data-engineering/re-establish-sparksession-using-databricks-connect-after-cluster/m-p/64394#M32559</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;when developing locally using Databricks connect how do I re-establish the SparkSession when the Cluster restarted? &lt;FONT face="courier new,courier"&gt;getOrCreate()&lt;/FONT&gt; seems to get the old invalid SparkSession even after Cluster restart instead of creating a new one or am I missing something?&lt;/P&gt;&lt;P&gt;Before Cluster restart everything works fine:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;&amp;gt;&amp;gt; spark = DatabricksSession.builder.getOrCreate()
DEBUG:databricks.connect:IPython module is present.
DEBUG:databricks.connect:Falling back to default configuration from the SDK.
INFO:databricks.sdk:loading DEFAULT profile from ~/.databrickscfg: host, token, cluster_id
DEBUG:databricks.sdk:Attempting to configure auth: pat
DEBUG:databricks.connect:Creating SparkSession from SDK config: &amp;lt;Config: host=https://adb-**************.**.azuredatabricks.net, token=***, auth_type=pat, cluster_id=****-******-********&amp;gt;
DEBUG:databricks.connect:Validating configuration by using the Databricks SDK
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): adb-6130442328907134.14.azuredatabricks.net:443
DEBUG:urllib3.connectionpool:https://adb-*******************.**.azuredatabricks.net:443 "GET /api/2.0/clusters/get?cluster_id=****-******-******** HTTP/1.1" 200 None
DEBUG:databricks.sdk:GET /api/2.0/clusters/get?cluster_id=****-******-********
&amp;lt; 200 OK
&amp;lt; {
&amp;lt;&amp;lt;&amp;lt; REDACTED: long message with api response &amp;gt;&amp;gt;&amp;gt;
&amp;lt; }
DEBUG:databricks.connect:Session validated successfully.

&amp;gt;&amp;gt; spark.sql("SELECT now()")
Out[7]: DataFrame[now(): timestamp]&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;After restart of the cluster:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;&amp;gt;&amp;gt; spark = DatabricksSession.builder.getOrCreate()
DEBUG:databricks.connect:IPython module is present.
DEBUG:databricks.connect:Falling back to default configuration from the SDK.
INFO:databricks.sdk:loading DEFAULT profile from ~/.databrickscfg: host, token, cluster_id
DEBUG:databricks.sdk:Attempting to configure auth: pat

&amp;gt;&amp;gt; spark.sql("SELECT now()")
Traceback (most recent call last):
  File "C:\***\lib\site-packages\IPython\core\interactiveshell.py", line 3508, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "&amp;lt;ipython-input-9-4c2039c39977&amp;gt;", line 1, in &amp;lt;module&amp;gt;
    spark.sql("SELECT now()")
  File "C:\***\lib\site-packages\pyspark\sql\connect\session.py", line 572, in sql
    data, properties = self.client.execute_command(cmd.command(self._client))
  File "C:\***\lib\site-packages\pyspark\sql\connect\client\core.py", line 1139, in execute_command
    data, _, _, _, properties = self._execute_and_fetch(req, observations or {})
  File "C:\***\lib\site-packages\pyspark\sql\connect\client\core.py", line 1515, in _execute_and_fetch
    for response in self._execute_and_fetch_as_iterator(req, observations):
  File "C:\***\lib\site-packages\pyspark\sql\connect\client\core.py", line 1493, in _execute_and_fetch_as_iterator
    self._handle_error(error)
  File "C:\***\lib\site-packages\pyspark\sql\connect\client\core.py", line 1805, in _handle_error
    raise error
  File "C:\***\lib\site-packages\pyspark\sql\connect\client\core.py", line 1486, in _execute_and_fetch_as_iterator
    yield from handle_response(b)
  File "C:\***\lib\site-packages\pyspark\sql\connect\client\core.py", line 1406, in handle_response
    self._verify_response_integrity(b)
  File "C:\***\lib\site-packages\pyspark\sql\connect\client\core.py", line 1937, in _verify_response_integrity
    raise PySparkAssertionError(
pyspark.errors.exceptions.base.PySparkAssertionError: Received incorrect server side session identifier for request. Please create a new Spark Session to reconnect. (5601ab48-a7cf-40c6-b59c-460381c816a6 != 8282a8c4-13cd-4fda-906e-2b1d8bec2115)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Shouldn't &lt;FONT face="courier new,courier"&gt;getOrCreate()&lt;/FONT&gt; recognize that it has to create a new Session? Am I doing something wrong? How do I forcibly create a new Session? I cannot use &lt;FONT face="courier new,courier"&gt;spark.stop()&lt;/FONT&gt; since this leads to the same error.&lt;/P&gt;&lt;P&gt;I am using &lt;FONT face="courier new,courier"&gt;databricks-connect 14.3.1, python 3.10.12&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 22 Mar 2024 12:39:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/re-establish-sparksession-using-databricks-connect-after-cluster/m-p/64394#M32559</guid>
      <dc:creator>MarkusFra</dc:creator>
      <dc:date>2024-03-22T12:39:20Z</dc:date>
    </item>
    <item>
      <title>Re: Re-establish SparkSession using Databricks connect after cluster restart</title>
      <link>https://community.databricks.com/t5/data-engineering/re-establish-sparksession-using-databricks-connect-after-cluster/m-p/64673#M32623</link>
      <description>&lt;P&gt;Thank you for your reply, &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt; . But there is no issue in the availability of databricks-connect. I had a bit time to look into it and found that this issue does not exist in databricks-connect with a custer with runtime 13.3. It occurs with databricks-connect 14.3 and a cluster with Runtime 14.3.&lt;/P&gt;&lt;P&gt;databricks-connect-13.3 and Runtime 13.3 Cluster:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;from databricks.connect import DatabricksSession

spark = DatabricksSession.builder.profile("DEBUGGING_133").getOrCreate()
spark.sql("SELECT 1")
# output: DataFrame[1: int]

# &amp;gt;&amp;gt;&amp;gt; Databricks cluster shuts down (e.g. because of timeout because of long running script)

spark.sql("SELECT 1")
# Cluster starts again automatically
# output: DataFrame[1: int]&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;databricks-connect-14.3 and Runtime 14.3 Cluster:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;from databricks.connect import DatabricksSession

spark = DatabricksSession.builder.profile("DEBUGGING_133").getOrCreate()
spark.sql("SELECT 1")
# output: DataFrame[1: int]

# &amp;gt;&amp;gt;&amp;gt; Databricks cluster shuts down (e.g. because of timeout because of long running script)

spark.sql("SELECT 1")
# Cluster starts again automatically
# output: 
Traceback (most recent call last):
  File "****\lib\site-packages\IPython\core\interactiveshell.py", line 3508, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "&amp;lt;ipython-input-7-e8eb9b165388&amp;gt;", line 1, in &amp;lt;module&amp;gt;
    spark.sql("SELECT 1")
  File "****\lib\site-packages\pyspark\sql\connect\session.py", line 572, in sql
    data, properties = self.client.execute_command(cmd.command(self._client))
  File "****\lib\site-packages\pyspark\sql\connect\client\core.py", line 1139, in execute_command
    data, _, _, _, properties = self._execute_and_fetch(req, observations or {})
  File "****\lib\site-packages\pyspark\sql\connect\client\core.py", line 1515, in _execute_and_fetch
    for response in self._execute_and_fetch_as_iterator(req, observations):
  File "****\lib\site-packages\pyspark\sql\connect\client\core.py", line 1493, in _execute_and_fetch_as_iterator
    self._handle_error(error)
  File "****\lib\site-packages\pyspark\sql\connect\client\core.py", line 1805, in _handle_error
    raise error
  File "****\lib\site-packages\pyspark\sql\connect\client\core.py", line 1486, in _execute_and_fetch_as_iterator
    yield from handle_response(b)
  File "****\lib\site-packages\pyspark\sql\connect\client\core.py", line 1406, in handle_response
    self._verify_response_integrity(b)
  File "****\lib\site-packages\pyspark\sql\connect\client\core.py", line 1937, in _verify_response_integrity
    raise PySparkAssertionError(
pyspark.errors.exceptions.base.PySparkAssertionError: Received incorrect server side session identifier for request. Please create a new Spark Session to reconnect. (ab413162-708a-423f-84c7-b04969ed3bf4 != 3c8ea3e4-e20f-4a31-82a0-ff938f4017c6)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is this maybe a bug? Where can I see known issues or report this?&lt;/P&gt;</description>
      <pubDate>Tue, 26 Mar 2024 14:59:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/re-establish-sparksession-using-databricks-connect-after-cluster/m-p/64673#M32623</guid>
      <dc:creator>MarkusFra</dc:creator>
      <dc:date>2024-03-26T14:59:08Z</dc:date>
    </item>
    <item>
      <title>Re: Re-establish SparkSession using Databricks connect after cluster restart</title>
      <link>https://community.databricks.com/t5/data-engineering/re-establish-sparksession-using-databricks-connect-after-cluster/m-p/69460#M33895</link>
      <description>&lt;P&gt;If anyone encounters this problem, the solution that worked for me was to restart the Jupyter kernel.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 18 May 2024 06:59:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/re-establish-sparksession-using-databricks-connect-after-cluster/m-p/69460#M33895</guid>
      <dc:creator>Michael_Chein</dc:creator>
      <dc:date>2024-05-18T06:59:02Z</dc:date>
    </item>
  </channel>
</rss>

