cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Local Development on Databricks

isaac_gritz
Valued Contributor II
Valued Contributor II

How to Develop Locally on Databricks with your Favorite IDE

dbx is a Databricks Labs project that allows you to develop code locally and then submit against Databricks interactive and job compute clusters from your favorite local IDE (AWS | Azure | GCP) such as VS Code, PyCharm, IntelliJ, or Eclipse.

dbx is an extension of the Databricks CLI and also makes it easy to manage multiple execution environments and deployment configurations as well as pre-built templates for integration with popular CI tools such as GitHub Actions, Azure DevOps, and GitLab.

Databricks has an official extension for VS Code to be able to execute code written locally against jobs or all purpose clusters. In addition, there is an official Databricks Driver for SQLTools in VS Code to browse SQL objects and run SQL queries in Databricks workspaces from VS Code.

Let us know in the comments if you have had a chance to test out dbx or our VS Code plugins for local IDE development!

6 REPLIES 6

-werners-
Esteemed Contributor III

I use it for local development of our libraries. Works fine, but I did not yet use it to submit to clusters.

matiasm
New Contributor II

I've found "dbx" really interesting. In particular it makes interactions with databricks from the local environment very smooth. I love how the dbx documentation describes the entire development process: it's the first time I see support for good engineering practices on the development phase.

There's one use case that dbx is not helping me with. I would like to develop a model locally using only pyspark and accessing the data on dbfs. I'm willing to use "dbutils", but to run it locally I need 'databricks-connect', which doesn't support databricks runtime 11, the latest one. Is there any other way to use dbutils locally?

isaac_gritz
Valued Contributor II
Valued Contributor II

Hi @Matias Marenchino​ unfortunately you cannot run dbutils locally but if you can use dbx execute against an interactive cluster for a more interactive development experience.

slowder
New Contributor II

We could build a helper function that detects if we are running on a generic pyspark versus on a Databricks cluster. That way, when databricks dbutils aren't available, we'd have a stand-in that would allow us to work disconnected until our code is ready to deploy to a cluster. @Isaac Gritz​ 

xiangzhu
Contributor

dbx is great for deploy, but hopefully spark connect could be released as soon as possible

Jfoxyyc
Valued Contributor

I'm actually not a fan of dbx. I prefer the AWS Glue interactive sessions way of using the IDE. It's exactly like the web notebook experience. I can see the reason why dbx exists, but I'd still like to use a regular notebook experience in my IDE.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.