Hello, I'm setting up a python environment to work with databricks in vscode using Windows.
I followed the instructions in Install Databricks Connect for Python, and found some issues/conflicts/errors:
- There is a useless tip in the Set up the client section section, it says that if you have the vscode extension you do not need to follow next instructions (makes no sense, you need to configure and setup the client)
- The documentation ask to uninstall pyspark in order to install databricks-connect, but the last one does not install pyspark, the problem is that at the end of the documentation page, it says:
"run a simple PySpark command, such as spark.range(1,10).show(). If there are no errors, you have successfully connected."
What raises the following error:
Traceback (most recent call last):
File "C:\Users\Santiago_Ortiz\EPAM\Clients\RDSA\poc\.venv\Scripts\find_spark_home.py", line 92, in <module>
print(_find_spark_home())
^^^^^^^^^^^^^^^^^^
File "C:\Users\Santiago_Ortiz\EPAM\Clients\RDSA\poc\.venv\Scripts\find_spark_home.py", line 56, in _find_spark_home
module_home = os.path.dirname(find_spec("pyspark").origin)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'origin'
\Users\Santiago_Ortiz\EPAM\Clients\RDSA\poc\.venv/Scripts/pyspark: line 24: /bin/load-spark-env.sh: No such file or directory
Python 3.11.6 (tags/v3.11.6:8b6ee5b, Oct 2 2023, 14:57:12) [MSC v.1935 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Could not open PYTHONSTARTUP
FileNotFoundError: [Errno 2] No such file or directory: 'C:/Program Files/Git/shell.py'
also, spark is not available as a global variable.
>>> spark
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
The documentation is very disorganised, the steps described cannot be reproduced as is, by anyone to achieve the desired result.