cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Installing Databricks Connect breaks pyspark local cluster mode

htu
New Contributor II

Hi, It seems that when databricks-connect is installed, pyspark is at the same time modified so that it will not anymore work with local master node. This has been especially useful in testing, when unit tests for spark-related code without any remote session.

Without databricks-connect this code works fine to initialize local spark session:

spark = SparkSession.Builder().master("local[1]").getOrCreate()

However, when databricks-connect python package is installed that same code fails with 

> RuntimeError: Only remote Spark sessions using Databricks Connect are supported. Could not find connection parameters to start a Spark remote session.

Question: Why does it work like this? Also, is this documented somewhere? I do not see it mentioned in Databricks Connect Troubleshooting or Limitations documentation pages. Same issue has been asked at Running pytest with local spark session · Issue #1152 · databricks/databricks-vscode · GitHub.

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @htu,

When you install Databricks Connect, it modifies the behaviour of PySpark in a way that prevents it from working with the local master node. This can be frustrating, especially when you’re trying to run unit tests for Spark-related code without any remote session.

Here are a few points to consider:

  1. Databricks Connect Purpose:

    • Databricks Connect is designed to enable remote Spark sessions from your local development environment. It allows you to connect to a Databricks cluster and execute Spark jobs remotely.
    • The modification you’re observing is intentional because Databricks Connect is primarily meant for remote sessions.
  2. Local Master Node Issue:

    • When you create a Spark session with master("local[1]"), it sets up a local Spark cluster with one worker thread.
    • However, Databricks Connect overrides this behaviour to ensure that only remote Spark sessions using Databricks Connect are supported.
    • As a result, attempting to use the local master node with Databricks Connect installed leads to the error you encountered.
  3. Workaround:

    • If you need to run unit tests locally without a remote session, consider using a different configuration.
    • Instead of using master("local[1]"), you can set up a standalone Spark cluster (e.g., by specifying a specific Spark master URL) or use a different local mode.
    • Keep in mind that this won’t be exactly the same as the Databricks environment, but it should allow you to test your Spark-related code locally.

In summary, Databricks Connect intentionally modifies PySpark behaviour to support remote sessions, which affects local master nodes.

While the documentation may not explicitly cover this behaviour, exploring alternative configuration...12. Keep an eye out for updates from Databricks, as they may address this limitation in future releases.

Feel free to ask if you have any further questions or need additional assistance! 😊

 

htu
New Contributor II

Hi, I undestand Databricks Connect is used for (that why I'm trying it out) but I would also like to be able to run tests. What do you mean with "different local mode"? 

As a side-topic, I tried running pytest tests with Databricks Connect session (both spark-connect server running in container at sc://localhost or Azure Databricks via DatabricksSession) and some of the tests fail with "Windows fatal exception: access violation" in both cases so that doesn't really work either.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!