cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Errors using Databricks Extension for VS Code on Windows

m997al
Contributor III

Hi - I am trying to get my VS Code (running on Windows) to work with the Databricks extension for VS Code.  It seems like I can almost get this to work.  Here is my setup:

1. Using Databricks Extension v2.4.0
2. Connecting to Databricks cluster with runtime 15.4 LTS ML (Python 3.11)
3. On Windows VS Code, using a virtual environment with Python 3.11.9
4. Using VS Code 1.93.1
5. As part of the Databricks Extension install in VS Code, Databricks Connect 13.3.2 was installed

What works:

1. Connect to cluster is OK
2. Creation of ".bundle/..." workspace folder for temp work is created.
3. Python environment and Databricks Connect seem to have no errors in VS Code
4. Using the "Databricks" icon in VS Code, I am able to "Run current file with Databricks Connect", I seem to be able to run my code!  This would be great, except...

What doesn't work:

1. I don't see an easy way to verify that the code actually ran on my Databricks cluster
2. Local environment variables for VS Code are used instead of the environment variables set up for the cluster.  This is a big issue.  I specifically want to use the environment variables I have set up for the cluster (that use secret scopes)...I want to get away from local environment variables with VS Code
3. Running "databricks-connect test" from the command line in the virtual environment on the Windows machine fails with this error message:  \Lib\site-packages\pyspark\sql\session.py", line 488, in getOrCreate raise RuntimeError(RuntimeError: Only remote Spark sessions using Databricks Connect are supported. Could not find connection parameters to start a Spark remote session.  
3.1. I am not sure if that even matters, as with the Databricks Extension for VS Code I guess that Databricks Connect is built into the extension, and doesn't need to run from the command line, yet I wouldn't expect this error.
5. Finally, most seriously, when I try to hit "sync" to sync the workspace folder on Databricks ".../.bundle/...." with the local repo folder... the files sync, but I get a "Building..." message that continues forever.
5.1. It looks like the sync completes (by the way, the above is like the 2nd time I ran it, the first time showed all the files synching)...but the "Build..." goes on forever and never stops.

This would be a really great thing for our department if I can get this to work.  I want to get rid of anyone using local environment variables, and control everything through Databricks secret scopes referenced through cluster environment variables.  It seems like I am close.

Can anyone help?  Thanks!

1 REPLY 1

m997al
Contributor III

So I found my problem(s).  

  1. I had a local environment variable called "DATABRICKS_HOST" that was set to the wrong URL.
  2. My Databricks runtime version and the databricks-connect version were not the same.  When I made them both 15.4.x, everything works as expected.
  3. Using the "Upload and Run File" option of the Databricks extension was exactly what I needed - it uses the environment variables present on the connected Databricks cluster itself (connected to Databricks secret scopes).  So that issue was solved as well.

There is still some question around the "sync" that seems to go on forever in VS Code until you cancel the "sync" task in VS Code, but this is not a show-stopper.  

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group