Databricks Community

pernilak · ‎04-15-2024

Hi!

As suggested by Databricks, we are working with Databricks from VSCode using Databricks bundles for our deployment and using the VSCode Databricks Extension and Databricks Connect during development.

However, there are some limitations that we are seeing (that hopefully can be fixed). One of them is when working with files from Unity Catalog using native python.

E.g: Using this code:

with open(my_file, 'r', encoding='utf-8') as f:
    content = f.read()

When running this in the Databricks Workspace, I am returned:

/Volumes/<my catalog>/<my schema>/<my volume path>/<my file>.xsl

However, running it from VSCode, I am returned:

No such file or directory: /Volumes/<my catalog>/<my schema>/<my volume path>/<my file>.xslx

I know that the extension works so that spark commands are executed on the attached cluster, but native python works so that it is ran on the machine. However, should there not be a way of forcing this also to use the cluster, as it makes no sense running this locally as I am trying to read a volume path?

I know that a can make the entire file "Run as a workflow in Databricks", but I would prefer being able to run cell by cell locally. I also know that if a change my code to running spark commands, e.g. spark.read(...), then it would work - but I don't think I should be forced to write my code differently just because I want to develop in VSCode as per suggested by Databricks.

rustam · ‎07-22-2024

Thank you for the detailed reply, @Retired_mod and the great question @pernilak!

I would also like to code and debug in VS Code while all the code in my Jupyter notebooks can be executed on a databricks cluster cell by cell with access to the data in our Unity Catalog. As described in this Azure Databricks Documentation, Databricks Connect runs only "code involving DataFrame operations on the cluster". Therefore, it seems not to address the original request or I'm missing something?

Is it possible to configure a databricks cluster as a remote python interpreter so that all local code accessed through VS Code is executed on the remote databricks cluster as I would have executed the code from a databricks notebook?

Thank you very much in advance and best regards,

Rustam