cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Working with Unity Catalog from VSCode using the Databricks Extension

pernilak
New Contributor III

Hi!

As suggested by Databricks, we are working with Databricks from VSCode using Databricks bundles for our deployment and using the VSCode Databricks Extension and Databricks Connect during development.

However, there are some limitations that we are seeing (that hopefully can be fixed). One of them is when working with files from Unity Catalog using native python.

E.g: Using this code:

with open(my_file, 'r', encoding='utf-8') as f:
    content = f.read()

When running this in the Databricks Workspace, I am returned:

/Volumes/<my catalog>/<my schema>/<my volume path>/<my file>.xsl

However, running it from VSCode, I am returned:

No such file or directory: /Volumes/<my catalog>/<my schema>/<my volume path>/<my file>.xslx

I know that the extension works so that spark commands are executed on the attached cluster, but native python works so that it is ran on the machine. However, should there not be a way of forcing this also to use the cluster, as it makes no sense running this locally as I am trying to read a volume path? 

I know that a can make the entire file "Run as a workflow in Databricks", but I would prefer being able to run cell by cell locally. I also know that if a change my code to running spark commands, e.g. spark.read(...), then it would work - but I don't think I should be forced to write my code differently just because I want to develop in VSCode as per suggested by Databricks.

1 REPLY 1

rustam
New Contributor II

Thank you for the detailed reply, @Retired_mod and the great question @pernilak!

I would also like to code and debug in VS Code while all the code in my Jupyter notebooks can be executed on a databricks cluster cell by cell with access to the data in our Unity Catalog. As described in this Azure Databricks Documentation, Databricks Connect runs only "code involving DataFrame operations on the cluster". Therefore, it seems not to address the original request or I'm missing something?

Is it possible to configure a databricks cluster as a remote python interpreter so that all local code accessed through VS Code is executed on the remote databricks cluster as I would have executed the code from a databricks notebook?

Thank you very much in advance and best regards, 

Rustam 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group