cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 

How Are You Using Local IDEs (VS Code / Cursor/ Whatever) to Develop & Run Code in Databricks?

adhi_databricks
Contributor

Hi everyone,

I’m trying to set up a smooth local-development workflow for Databricks and would love to hear how others are doing it.

My Current Setup

  • I do most of my development in Cursor (VS Code-based editor) because the AI agents make coding much faster.

  • After development, I push code to Git, then open Databricks and pull the repo, and only then can I run and test the code inside a Databricks notebook or job.

  • This back-and-forth is slow — I’d like to run / test directly from my local IDE if possible.

What I Tried

1. Databricks VS Code Extension

I saw that the Databricks docs mention a VS Code extension, but:

  • Cursor doesn’t seem to allow me to install this extension.

  • Not sure if Cursor supports Databricks extensions at all.

Has anyone successfully used the Databricks VS Code extension inside Cursor?

2. Databricks Connect

I also tried Databricks Connect. Tutorials show connecting to a Personal Compute cluster, but:

  • In my organization, compute is owned by a principal / service account.

  • I’m added as a user, but when I list clusters through Databricks Connect, I don’t see any clusters.

  • So the connect step fails.

Not sure if this is a permissions issue, or if Databricks Connect requires personal compute only.

My Questions

  1. How are you all developing locally and executing code in Databricks?
    Do you run code locally against DBFS / clusters, or do you push to repos and test in notebooks?

  2. Does Databricks Connect work with shared or service-principal–owned clusters?
    Or only with Personal Compute?

  3. Is there any known workaround to make the VS Code extension work in Cursor?

  4. Is there any other method I’m missing for local development + remote execution?

Any advice, examples, or even your workflow setups would be super helpful.
Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

szymon_dybczak
Esteemed Contributor III

Hi @adhi_databricks ,

Since cursor is based on open source vs code you should be able to install databricks extenstion.
Check below video. It shows how you can setup cursor to work with databricks locally šŸ™‚

Databricks + Cursor IDE: Step-by-Step AI Coding Tutorial

View solution in original post

2 REPLIES 2

szymon_dybczak
Esteemed Contributor III

Hi @adhi_databricks ,

Since cursor is based on open source vs code you should be able to install databricks extenstion.
Check below video. It shows how you can setup cursor to work with databricks locally šŸ™‚

Databricks + Cursor IDE: Step-by-Step AI Coding Tutorial

siva-anantha
New Contributor III

@adhi_databricks: I want to add my perspective when it comes to pure local development (without Databricks connect).

I wanted to setup a local development environment without connecting to Databricks workspace/cloud storage; develop PySpark code in VSCode using local Spark and GenAI. After local development, I wanted to deploy code to Databricks workspace for Data Engineering alone.

Faced following challenges whether we use Gen AI or Not. 

1) Notebook Architecture
   -  we import notebooks using %run. Linting support is minimal(none) in Notebooks. we are facing runtime errors for syntax; sometimes indent issues also. It must have been caught during formatting and linting phase. 
   - preparation of coverage reports, code quality checks using tools like Sonar and executing test cases is challenging
   - linting, testing, code quality analysis and coverage check must be done before the code is pushed to Databricks workspace in test or prod environment
 
2) Strong Dependencies with Cloud storage, Delta and Meta-store (HIVE or UC)
    - Difficult/unable to mock cloud storage/delta/meta-store
    - Unable to test all parts of the code due to Dependencies 

We were able to overcome some of the issues using the following approach,

1) Create a pure python package that has framework or reusable code - code that does not change often. This will be formatted, linted, tested and built to wheel package. This is a common package and has its own lifecycle. And developers will install it locally in a Python venv.

2) Create features/changes that are to be introduced in the current release/sprint. This will eventually become a wheel, but at the moment can be separate modules in the vscode. This will import required modules from the common package. Code formatting, Linting and Unit testing can be automated for these changes - with Gen AI help.

3) Dependency handling - Delta Lake/Hive abstraction implemented using Docker - this is time consuming, but possible. When developers start the VSCode, Docker container can also start, and dependencies can be made ready. It is not a smooth setup; there are issues with this. 

Hope this helps