Installing Databricks Connect breaks pyspark local cluster mode
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-02-2024 11:14 PM
Hi, It seems that when databricks-connect is installed, pyspark is at the same time modified so that it will not anymore work with local master node. This has been especially useful in testing, when unit tests for spark-related code without any remote session.
Without databricks-connect this code works fine to initialize local spark session:
spark = SparkSession.Builder().master("local[1]").getOrCreate()However, when databricks-connect python package is installed that same code fails with
> RuntimeError: Only remote Spark sessions using Databricks Connect are supported. Could not find connection parameters to start a Spark remote session.
Question: Why does it work like this? Also, is this documented somewhere? I do not see it mentioned in Databricks Connect Troubleshooting or Limitations documentation pages. Same issue has been asked at Running pytest with local spark session · Issue #1152 · databricks/databricks-vscode · GitHub.