<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Installing Databricks Connect breaks pyspark local cluster mode in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/100644#M40367</link>
    <description>&lt;P&gt;Databricks-Connect is by design a drop-in replacement for pyspark, essentially. It transparently takes over execution of the Spark parts without change to the program, and is definitely a different 'environment' from local pyspark.&lt;/P&gt;
&lt;P&gt;As with any situation where you need to deal with separate software environments, you'd typically have a separate venv for each, and as such could have pyspark in one and databricks-connect in another. Does that not answer this?&lt;/P&gt;</description>
    <pubDate>Mon, 02 Dec 2024 15:35:45 GMT</pubDate>
    <dc:creator>sean_owen</dc:creator>
    <dc:date>2024-12-02T15:35:45Z</dc:date>
    <item>
      <title>Installing Databricks Connect breaks pyspark local cluster mode</title>
      <link>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/67994#M33510</link>
      <description>&lt;P&gt;Hi, It seems that when databricks-connect is installed, pyspark is at the same time modified so that it will not anymore work with local master node. This has been especially useful in testing, when unit tests for spark-related code without any remote session.&lt;/P&gt;&lt;P&gt;Without databricks-connect this code works fine to initialize local spark session:&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;&lt;SPAN&gt;spark = SparkSession.Builder().master("local[1]").getOrCreate()&lt;/SPAN&gt;&lt;/PRE&gt;&lt;P&gt;However, when databricks-connect python package is installed that same code fails with&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&amp;gt;&amp;nbsp;RuntimeError: Only remote Spark sessions using Databricks Connect are supported. Could not find connection parameters to start a Spark remote session.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;Question: Why does it work like this? Also, is this documented somewhere? I do not see it mentioned in Databricks Connect Troubleshooting or Limitations documentation pages. Same issue has been asked at&amp;nbsp;&lt;A href="https://github.com/databricks/databricks-vscode/issues/1152" target="_blank"&gt;Running pytest with local spark session · Issue #1152 · databricks/databricks-vscode · GitHub.&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 03 May 2024 06:14:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/67994#M33510</guid>
      <dc:creator>htu</dc:creator>
      <dc:date>2024-05-03T06:14:32Z</dc:date>
    </item>
    <item>
      <title>Re: Installing Databricks Connect breaks pyspark local cluster mode</title>
      <link>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/68021#M33524</link>
      <description>&lt;P&gt;Hi, I undestand Databricks Connect is used for (that why I'm trying it out) but I would also like to be able to run tests. What do you mean with "&lt;SPAN&gt;different local mode"?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;As a side-topic, I tried running pytest tests with Databricks Connect session (both spark-connect server running in container at sc://localhost or Azure Databricks via DatabricksSession) and some of the tests fail with "Windows fatal exception: access violation" in both cases so that doesn't really work either.&lt;/P&gt;</description>
      <pubDate>Fri, 03 May 2024 09:26:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/68021#M33524</guid>
      <dc:creator>htu</dc:creator>
      <dc:date>2024-05-03T09:26:04Z</dc:date>
    </item>
    <item>
      <title>Re: Installing Databricks Connect breaks pyspark local cluster mode</title>
      <link>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/84758#M37225</link>
      <description>&lt;P&gt;Could you please provide an example of the proposed workaround? Nothing what I have tried helps, there is always the same error as in the original post. This is very frustrating to say the least - the inability to switch properly and natively to a local mode.&lt;/P&gt;&lt;P&gt;I look forward to hearing from you.&lt;/P&gt;&lt;P&gt;Kind regards, Dmytro.&lt;/P&gt;</description>
      <pubDate>Tue, 27 Aug 2024 19:30:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/84758#M37225</guid>
      <dc:creator>dmytro</dc:creator>
      <dc:date>2024-08-27T19:30:43Z</dc:date>
    </item>
    <item>
      <title>Re: Installing Databricks Connect breaks pyspark local cluster mode</title>
      <link>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/89361#M37769</link>
      <description>&lt;P&gt;Hey guys. I am facing the same issue. Before databricks connect the unit tests with pytest were working properly.&lt;/P&gt;&lt;P&gt;I´ve tried even to create a newSession()&amp;nbsp; and use pytest-spark library but both approach did not work.&lt;/P&gt;&lt;P&gt;I got the following error "&amp;nbsp;&lt;/P&gt;&lt;P&gt;E RuntimeError: Only remote Spark sessions using Databricks Connect are supported. Use DatabricksSession.builder to create a remote Spark session instead.&lt;BR /&gt;E Refer to &lt;A href="https://docs.databricks.com/dev-tools/databricks-connect.html" target="_blank"&gt;https://docs.databricks.com/dev-tools/databricks-connect.html&lt;/A&gt; on how to configure Databricks Connect.&lt;/P&gt;&lt;P&gt;.venv\Lib\site-packages\pyspark\sql\session.py:552: RuntimeError&lt;BR /&gt;===================================================================== short test summary info ======================================================================&lt;BR /&gt;ERROR tests/unit/test_functions.py::test_if_function_add_year_month - RuntimeError: Only remote Spark sessions using Databricks Connect are supported. Use DatabricksSession.builder to create a remote Spark session instead.&lt;BR /&gt;========================================================================= 1 error in 0.17s ========================================================================= "&lt;/P&gt;&lt;P&gt;Let´s wait a solution. In case I have any news I update the topic here. Cheers.&lt;/P&gt;</description>
      <pubDate>Tue, 10 Sep 2024 19:24:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/89361#M37769</guid>
      <dc:creator>dpires92</dc:creator>
      <dc:date>2024-09-10T19:24:30Z</dc:date>
    </item>
    <item>
      <title>Re: Installing Databricks Connect breaks pyspark local cluster mode</title>
      <link>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/90616#M37964</link>
      <description>&lt;P&gt;This is ridiculous. It's absolutely unacceptable as "intended behavior" for a professional software package to clobber the functionality of another package just by being installed.&lt;/P&gt;</description>
      <pubDate>Mon, 16 Sep 2024 17:10:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/90616#M37964</guid>
      <dc:creator>Angus-Dawson</dc:creator>
      <dc:date>2024-09-16T17:10:01Z</dc:date>
    </item>
    <item>
      <title>Re: Installing Databricks Connect breaks pyspark local cluster mode</title>
      <link>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/90690#M37975</link>
      <description>&lt;P&gt;Indeed. This becomes more obvious when I was looking at the databricks-connect wheel package contents, and it includes also pyspark package. The pyspark inside it is like 9 MB whereas regular pyspark package is over 300 MB. I guess they've only left spark-connect client side parts and removed whole server thing. Makes kind of sense but it should not be done in a way to replace existing package.&lt;/P&gt;&lt;P&gt;I even tried connecting to local (docker-hosted) Spark but it crashes on some test cases.&lt;/P&gt;</description>
      <pubDate>Tue, 17 Sep 2024 07:45:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/90690#M37975</guid>
      <dc:creator>htu</dc:creator>
      <dc:date>2024-09-17T07:45:53Z</dc:date>
    </item>
    <item>
      <title>Re: Installing Databricks Connect breaks pyspark local cluster mode</title>
      <link>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/96542#M39289</link>
      <description>&lt;P&gt;Also frustrated by this behavior. Databricks-connect should not replace the rest of local spark.&lt;/P&gt;&lt;P&gt;Is there any solution to this?&lt;/P&gt;</description>
      <pubDate>Mon, 28 Oct 2024 16:18:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/96542#M39289</guid>
      <dc:creator>Kolath</dc:creator>
      <dc:date>2024-10-28T16:18:53Z</dc:date>
    </item>
    <item>
      <title>Re: Installing Databricks Connect breaks pyspark local cluster mode</title>
      <link>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/96569#M39295</link>
      <description>&lt;P&gt;I managed to work around it in Poetry by using optional dependency groups, and then when I want to switch between Databricks Connect and local PySpark functionality I run this:&lt;/P&gt;&lt;PRE&gt;poetry install --with &amp;lt;group x&amp;gt; --without &amp;lt;group y&amp;gt; --sync&amp;nbsp;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 28 Oct 2024 18:54:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/96569#M39295</guid>
      <dc:creator>Angus-Dawson</dc:creator>
      <dc:date>2024-10-28T18:54:41Z</dc:date>
    </item>
    <item>
      <title>Re: Installing Databricks Connect breaks pyspark local cluster mode</title>
      <link>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/100042#M40175</link>
      <description>&lt;P&gt;Hi, we are facing this issue as well, i.e. RuntimeError as reported in&amp;nbsp;&lt;A href="https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/89361/highlight/true#M37769" target="_self"&gt;this comment&lt;/A&gt;. We use the workaround with poetry groups as suggested in&amp;nbsp;&lt;A href="https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/89361/highlight/true#M37769" target="_self"&gt;this comment.&lt;/A&gt;&lt;/P&gt;&lt;P&gt;The workaround introduces unnecessary an non-intuitive complexity to dependency management and provides potential space for introducing errors.&lt;/P&gt;&lt;P&gt;Is there any plan to fix this behaviour?&lt;/P&gt;</description>
      <pubDate>Tue, 26 Nov 2024 09:07:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/100042#M40175</guid>
      <dc:creator>lukany</dc:creator>
      <dc:date>2024-11-26T09:07:26Z</dc:date>
    </item>
    <item>
      <title>Re: Installing Databricks Connect breaks pyspark local cluster mode</title>
      <link>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/100644#M40367</link>
      <description>&lt;P&gt;Databricks-Connect is by design a drop-in replacement for pyspark, essentially. It transparently takes over execution of the Spark parts without change to the program, and is definitely a different 'environment' from local pyspark.&lt;/P&gt;
&lt;P&gt;As with any situation where you need to deal with separate software environments, you'd typically have a separate venv for each, and as such could have pyspark in one and databricks-connect in another. Does that not answer this?&lt;/P&gt;</description>
      <pubDate>Mon, 02 Dec 2024 15:35:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/100644#M40367</guid>
      <dc:creator>sean_owen</dc:creator>
      <dc:date>2024-12-02T15:35:45Z</dc:date>
    </item>
    <item>
      <title>Re: Installing Databricks Connect breaks pyspark local cluster mode</title>
      <link>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/102016#M40933</link>
      <description>&lt;P class=""&gt;How can I configure the environment in pytest.ini?&lt;/P&gt;&lt;P class=""&gt;I need a local Spark session for unit testing, but I encountered the following error: RuntimeError: Only remote Spark sessions using Databricks Connect are supported.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Dec 2024 08:44:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/102016#M40933</guid>
      <dc:creator>noah_sunny</dc:creator>
      <dc:date>2024-12-13T08:44:10Z</dc:date>
    </item>
    <item>
      <title>Re: Installing Databricks Connect breaks pyspark local cluster mode</title>
      <link>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/102056#M40952</link>
      <description>&lt;P&gt;If you want to run Spark locally, you simply use pyspark!&lt;/P&gt;</description>
      <pubDate>Fri, 13 Dec 2024 13:34:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/102056#M40952</guid>
      <dc:creator>sean_owen</dc:creator>
      <dc:date>2024-12-13T13:34:12Z</dc:date>
    </item>
    <item>
      <title>Re: Installing Databricks Connect breaks pyspark local cluster mode</title>
      <link>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/102203#M41017</link>
      <description>&lt;P&gt;I got the error mentioned in the original post due to the installation of Databricks Connect.&lt;/P&gt;</description>
      <pubDate>Mon, 16 Dec 2024 07:51:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/102203#M41017</guid>
      <dc:creator>noah_sunny</dc:creator>
      <dc:date>2024-12-16T07:51:19Z</dc:date>
    </item>
    <item>
      <title>Re: Installing Databricks Connect breaks pyspark local cluster mode</title>
      <link>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/102536#M41152</link>
      <description>&lt;P&gt;I think in case you're deliberately installing databricks-connect, then you need to handle the local spark session creation.&lt;/P&gt;&lt;P&gt;My issue is that I'm using databricks-dlt package which installs databricks-connect as a dependency. In the latest package version 0.3.0, this breaks pyspark.sql SparkSession creation, forcing me to use DatabricksSparkSession.&lt;/P&gt;&lt;P&gt;To solve this I need to hard code databricks-dlt to previous release version which is not ideal, in case next releases bring new features that I would like to use.&lt;/P&gt;</description>
      <pubDate>Wed, 18 Dec 2024 17:01:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/102536#M41152</guid>
      <dc:creator>mslow</dc:creator>
      <dc:date>2024-12-18T17:01:59Z</dc:date>
    </item>
    <item>
      <title>Re: Installing Databricks Connect breaks pyspark local cluster mode</title>
      <link>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/105517#M42165</link>
      <description>&lt;P&gt;Users should be able to have a single Python environment setup with a single set of Python dependencies specified (in pyproject.toml or similar) and installed, and alternately point their code at either a local or remote Spark cluster simply by changing the URL they pass to `&lt;SPAN&gt;DatabricksSession.builder&lt;/SPAN&gt;&lt;SPAN&gt;.remote(...)`.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;As far as I can tell, this is not possible. Databricks Connect does not work with a local, open source Spark Connect server (i.e. what you get when you run `sbin/start-connect-server.sh`). And open source Spark Connect does not work with a remote Databricks cluster. And as others have pointed out, installing `databricks-connect` and `pyspark` side-by-side yields a broken Python environment.&lt;/P&gt;&lt;P&gt;No one wants to have a Databricks cluster running 24/7 just so they can run their tests quickly. And having multiple Python environments just to handle these incompatibilities means being forced to abandon modern Python packaging tooling like Poetry and going back to manually wrangling venvs.&lt;/P&gt;&lt;P&gt;Is there a design reason Databricks cannot simply enable Databricks Connect to work with open source Spark Connect servers?&lt;/P&gt;</description>
      <pubDate>Tue, 14 Jan 2025 03:24:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/105517#M42165</guid>
      <dc:creator>nick</dc:creator>
      <dc:date>2025-01-14T03:24:47Z</dc:date>
    </item>
    <item>
      <title>Re: Installing Databricks Connect breaks pyspark local cluster mode</title>
      <link>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/105518#M42166</link>
      <description>&lt;P&gt;I don't know the details but it's not quite the same 'connect'&lt;/P&gt;&lt;P&gt;But I think you can simply have two venvs if you want - one with Connect and one with local pyspark. You probably do want to treat these as distinct environments; they are different environments. This is what virtual environments are there for IMHO.&lt;/P&gt;</description>
      <pubDate>Tue, 14 Jan 2025 03:39:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/105518#M42166</guid>
      <dc:creator>sean_owen</dc:creator>
      <dc:date>2025-01-14T03:39:37Z</dc:date>
    </item>
    <item>
      <title>Re: Installing Databricks Connect breaks pyspark local cluster mode</title>
      <link>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/105524#M42167</link>
      <description>&lt;P&gt;I don't see how this works with modern Python packaging tooling and standards.&lt;/P&gt;&lt;P&gt;How would your `pyproject.toml` look to support these multiple environments? How would modern build tools (like Poetry, Pipenv, Hatch, etc.) build/publish your project? How would someone build continuous integration testing?&lt;/P&gt;&lt;P&gt;What I'm understanding is that with databricks-connect you have to basically abandon modern Python packaging if you want to be able to run tests locally. You need to manually maintain multiple `requirements.txt` files and matching venvs, and manually switch between one and the other depending on whether your target is a local Spark cluster or remote one. As you switch back and forth, you'll probably need to futz with your IDE's config so that type and lint checks don't break. And to package your application for deployment, you'll have to build an sdist using a hand-rolled script; no `poetry build` for you.&lt;/P&gt;&lt;P&gt;This all feels very kludgy and outdated. There should be a better way, one that works naturally with modern Python packaging standards.&lt;/P&gt;</description>
      <pubDate>Tue, 14 Jan 2025 05:39:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/105524#M42167</guid>
      <dc:creator>nick</dc:creator>
      <dc:date>2025-01-14T05:39:38Z</dc:date>
    </item>
    <item>
      <title>Re: Installing Databricks Connect breaks pyspark local cluster mode</title>
      <link>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/110246#M43530</link>
      <description>&lt;P&gt;I use databricks-connect for local IDE-based (pycharm) development of databrick-jobs. New databricks-connect with&amp;nbsp;DatabricksSession made me a lot of trouble, since I needed to maintain two separate import system for local development, and for job execution on databricks. And this messy solution was suggested by databricks &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/dev-tools/databricks-connect/python/examples#:~:text=The%20following%20example%20uses%20the%20DatabricksSession%20class%2C%20or%20uses%20the%20SparkSession%20class%20if%20the%20DatabricksSession%20class%20is%20unavailable%2C%20to%20query%20the%20specified%20table%20and%20return%20the%20first%205%20rows.%20This%20example%20uses%20the%20SPARK_REMOTE%20environment%20variable%20for%20authentication." target="_self"&gt;here&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;I think i found a workaround to this issue. I haven't tested it deeply, but basic functionality seems to be working.&lt;/P&gt;&lt;P&gt;With this I can create and use SparkSession from my IDE, connecting to remote databricks cluster with&amp;nbsp;&lt;A href="https://docs.databricks.com/en/release-notes/runtime/15.4lts.html" target="_self"&gt;DBR15.4&lt;/A&gt; (with&amp;nbsp;&lt;SPAN&gt;Apache Spark 3.5.0).&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;So my solution is the next:&lt;/P&gt;&lt;P&gt;I've&amp;nbsp;&lt;STRONG&gt;removed&amp;nbsp;databricks-connect&lt;/STRONG&gt;&amp;nbsp;package and &lt;STRONG&gt;installed pyspark 3.5.0&lt;/STRONG&gt;&amp;nbsp;instead.&amp;nbsp;To access my remote databricks cluster I use&amp;nbsp;&lt;A href="https://spark.apache.org/docs/latest/api/python/getting_started/quickstart_connect.html" target="_self"&gt;spark-connect&lt;/A&gt; and its SPARK_REMOTE env variable, which looks something like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;SPARK_REMOTE=sc://blahlbah.cloud.databricks.com:443/;token=mytoken;x-databricks-cluster-id=myclusterid&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I built the value of the variable is based on &lt;A href="https://docs.databricks.com/en/dev-tools/databricks-connect/python/advanced.html#configure-the-spark-connect-connection-string" target="_self"&gt;this documentation&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;After configuring the env var for a python script execution I can use SparkSession as before:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 14 Feb 2025 16:48:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/110246#M43530</guid>
      <dc:creator>beliz</dc:creator>
      <dc:date>2025-02-14T16:48:33Z</dc:date>
    </item>
    <item>
      <title>Re: Installing Databricks Connect breaks pyspark local cluster mode</title>
      <link>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/112074#M44099</link>
      <description>&lt;P&gt;This issue is very bizarre and was very cumbersome to deal with.. but i think i found a solution i can live with&lt;/P&gt;&lt;P&gt;For me, using 2 venv just complicates to project in a way i an not willing to maintain.&lt;BR /&gt;Spark Connect, although promising, lacks in a lot of ares compared to the databricks connect, to name a few:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;session needs to be rolled every hour due to the token going bad,&amp;nbsp;&lt;/LI&gt;&lt;LI&gt;some cluster lifecycles are just not recognized by a regular Spark Connect, like when the cluster is warming up, databricks will wait for the cluster to be ready, while spark connect will raise an exception. One CAN map the different scenarios and build handling logic, but why would he?&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;As I need this specifically for unit-testing, an I use pytest, I downloaded the original pyspark to a directory on my root folder with:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;pip install --target pyspark_unpatched pyspark==X.Y.Z&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;i created this conftest.py in the root of the project:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;def override_databricks_spark() -&amp;gt; None:
    repo_root = Path(__file__).parent.resolve()
    unpatched_pyspark_dir = os.path.expanduser(repo_root / "pyspark_unpatched")
    sys.path.insert(0, unpatched_pyspark_dir)
    import pyspark as unpatched_pyspark
    sys.path.remove(unpatched_pyspark_dir)
    sys.modules["pyspark"] = unpatched_pyspark
    print(f"Overridden pyspark with module from {unpatched_pyspark_dir}")

override_databricks_spark()&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;Because conftest loads first, It allows you do make sure the the unpatched version of pytest loads first, letting you test peacefully&lt;/P&gt;&lt;P&gt;The main drawback of this is you must use the `pyspark` import statement, using Databricks will break the tests (not that it matters, at least i think it doesn't). and obvoiusly, downloading pyspark, but considering the alternatives, i am willing to bite that bullet.&lt;/P&gt;&lt;P&gt;This was my &lt;A href="https://stackoverflow.com/questions/43162722/mocking-a-module-import-in-pytest" target="_self"&gt;ref&lt;/A&gt; from stack overflow&lt;/P&gt;</description>
      <pubDate>Sat, 08 Mar 2025 21:48:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/112074#M44099</guid>
      <dc:creator>solanoam</dc:creator>
      <dc:date>2025-03-08T21:48:58Z</dc:date>
    </item>
    <item>
      <title>Re: Installing Databricks Connect breaks pyspark local cluster mode</title>
      <link>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/112574#M44256</link>
      <description>&lt;P&gt;I agree with most of the comments above, that the current approach of databricks-connect is not great (it sucks to be frankly). Its an issue that was bugging me since more than 2 years now.&lt;BR /&gt;By the way, i checked how this could be done with poetry and uv. In poetry i got something somewhat working since this PR was merged:&amp;nbsp;&lt;A href="https://github.com/python-poetry/poetry/pull/9553" target="_blank"&gt;https://github.com/python-poetry/poetry/pull/9553&lt;/A&gt;&lt;BR /&gt;With uv I still don't have an acceptable solution. In fact the uv devs seem to claim, that there is no way to solve this with the current python packaging specifications: &lt;A href="https://github.com/astral-sh/uv/issues/10238#issuecomment-2575893989" target="_blank"&gt;https://github.com/astral-sh/uv/issues/10238#issuecomment-2575893989&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 14 Mar 2025 11:41:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/installing-databricks-connect-breaks-pyspark-local-cluster-mode/m-p/112574#M44256</guid>
      <dc:creator>Martinitus</dc:creator>
      <dc:date>2025-03-14T11:41:24Z</dc:date>
    </item>
  </channel>
</rss>

