cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
cancel
Showing results for 
Search instead for 
Did you mean: 

VS Code integration with Python Notebook and Remote Cluster

mohaimen_syed
New Contributor III

Hi, I'm trying to work on VS code remotely on my machine instead of using the Databricks environment on my browser. I have went through documentation to set up the Databricks. extension and also setup Databricks Connect but don't feel like they work hand in hand. I'm trying to run a Python Notebook using DB Connect but can't get it to run on the remote DB cluster

3 REPLIES 3

Kaniz
Community Manager
Community Manager

Hi @mohaimen_syed, It sounds like you’re trying to use Databricks Connect to run a Python notebook on a remote Azure Databricks cluster from your local machine.

Let’s break down the steps to achieve this:

  1. Configure Azure Databricks Authentication:

    • Before using Databricks Connect, ensure that your Azure Databricks workspace and cluster meet the requirements for Databricks Connect.
    • You’ll need your cluster ID, which you can find by clicking on “Compute” in your workspace sidebar and then copying the cluster ID from the URL.
    • Install the Databricks CLI if you haven’t already. On Linux/macOS, you can use Homebrew, and on Windows, you have a few options like winget, Chocolatey, or Windows Subsystem for Linux (WSL). Refer to the official documentation for installation instructions.
  2. Install PyCharm:

    • Make sure you have PyCharm installed on your local machine. The tutorial was tested with PyCharm Community Edition 2023.3.5, but other versions should work as well.
  3. Python Version Compatibility:

    • Ensure that the minor version of Python installed on your development machine matches the minor Python version of your Azure Databricks cluster. Here’s a quick reference:
      • Databricks Runtime 15.0 ML, 15.0: Python 3.11
      • Databricks Runtime 13.0 ML - 14.3 ML, 13.0 - 14.3: Python 3.10
  4. Using Databricks Connect in PyCharm:

    • Open your Python project in PyCharm.
    • Install the databricks-connect package using pip: pip install databricks-connect.
    • Configure Databricks Connect in PyCharm by specifying your cluster ID and authentication details.
    • Create a new Python script or notebook in PyCharm.
    • Import the databricks module and use it to connect to your remote Databricks cluster.
    • You can now run your Python code against the cluster directly from PyCharm.
  5. Alternative: JupyterLab Integration:

    • If you prefer using JupyterLab, you can integrate it with Databricks. Start JupyterLab using the standard command: $ jupyter lab.
    • In the notebook, select the remote kernel from the menu to connect to the Databricks cluster. You’ll get a Spark session using the following Python code:
      from databrickslabs_jupyterlab.connect import dbcontext
      dbcontext()
      

Remember to adjust the steps based on your specific setup and preferences. If you encounter any issues, feel free to ask for further assistance! 😊

For more detailed information, you can refer to the official documentation on Databricks Connect for Python.1 Additionally, if you’re interested in using JupyterLab, check out the JupyterLab-Databricks Integration blog post2.

Happy coding!

To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedback not only helps us assist you better but also benefits other community members who may have similar questions in the future.

If you found the answer helpful, consider giving it a kudo. If the response fully addresses your question, please mark it as the accepted solution. This will help us close the thread and ensure your question is resolved.

We appreciate your participation and are here to assist you further if you need it!

 

mohaimen_syed
New Contributor III

Hi @Kaniz , I mostly use VS Code as my environment of choice because is has the added benefit of CoPilot, so my question is can the same be done on VS Code instead of PyCharm/JupyterLab?

Hi @mohaimen_syed, You can absolutely use Visual Studio Code (VS Code) as your development environment for working with Databricks Connect. In fact, VS Code is a popular choice among developers, and with the added benefit of CoPilot, it can enhance your productivity even further.

Here’s how you can set up Databricks Connect in VS Code:

  1. Install Databricks Connect:
    • First, ensure that you have the databricks-connect package installed. You can do this by running the following command in your terminal or command prompt:
      pip install databricks-connect
      
  2. Configure Databricks Connect:
    • Next, configure Databricks Connect by specifying your cluster-ID and authentication details. You’ll need to provide the necessary information to connect to your remote Databricks cluster.
  3. Create a Python Script or Notebook in VS Code:
    • Open VS Code and create a new Python script or notebook.
  4. Import the databricks Module:
    • In your Python script or notebook, import the databricks module. This module allows you to connect to your remote Databricks cluster.
  5. Connect to the Remote Cluster:
    • Use the databricks module to establish a connection to your remote Databricks cluster. This will give you access to the Spark session on the cluster.

Here’s an example of how you can connect to your remote Databricks cluster using Databricks Connect in a Python script within VS Code:

import databricks

# Connect to your remote Databricks cluster
databricks.connect(cluster_id="your-cluster-id", token="your-access-token")

# Now you can interact with the cluster using Spark commands
# For example:
# spark = databricks.get_spark()
# df = spark.sql("SELECT * FROM your_table")
# df.show()

Remember to replace "your-cluster-id" and "your-access-token" with your actual cluster-ID and access token. Once you’ve set up Databricks Connect and established the connection, you should be able to run Python code on the remote Databricks cluster directly from within VS Code.

Feel free to explore the power of Databricks Connect in your favourite development environment! If you have any further questions or need assistance, feel free to ask. 😊.


I’ve tailored the instructions specifically for using VS Code, considering its popularity and the added benefit of CoPilot. If you encounter any issues during the setup process, don’t hesitate to ask for further guidance!