cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Enhance Documentation for databricks-connect for python

santiagortiiz
New Contributor III

Hello, I'm setting up a python environment to work with databricks in vscode using Windows.

I followed the instructions in Install Databricks Connect for Python, and found some issues/conflicts/errors:

- There is a useless tip in the Set up the client section section, it says that if you have the vscode extension you do not need to follow next instructions (makes no sense, you need to configure and setup the client)

- The documentation ask to uninstall pyspark in order to install databricks-connect, but the last one does not install pyspark, the problem is that at the end of the documentation page, it says:
"run a simple PySpark command, such as spark.range(1,10).show(). If there are no errors, you have successfully connected."
What raises the following error:

 

 

 

 

Traceback (most recent call last):
  File "C:\Users\Santiago_Ortiz\EPAM\Clients\RDSA\poc\.venv\Scripts\find_spark_home.py", line 92, in <module>
    print(_find_spark_home())
          ^^^^^^^^^^^^^^^^^^
  File "C:\Users\Santiago_Ortiz\EPAM\Clients\RDSA\poc\.venv\Scripts\find_spark_home.py", line 56, in _find_spark_home
    module_home = os.path.dirname(find_spec("pyspark").origin)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'origin'
\Users\Santiago_Ortiz\EPAM\Clients\RDSA\poc\.venv/Scripts/pyspark: line 24: /bin/load-spark-env.sh: No such file or directory
Python 3.11.6 (tags/v3.11.6:8b6ee5b, Oct  2 2023, 14:57:12) [MSC v.1935 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Could not open PYTHONSTARTUP
FileNotFoundError: [Errno 2] No such file or directory: 'C:/Program Files/Git/shell.py'

 

 

 

 

also, spark is not available as a global variable.

 

 

 

 

>>> spark
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>

 

 

 

 

The documentation is very disorganised, the steps described cannot be reproduced as is, by anyone to achieve the desired result.

1 REPLY 1

santiagortiiz
New Contributor III

Additionally, there are no comments on how to avoid uploading virtual environment to the catalog when using databricks extension for vscode

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group