cancel
Showing results for 
Search instead for 
Did you mean: 
Generative AI
Explore discussions on generative artificial intelligence techniques and applications within the Databricks Community. Share ideas, challenges, and breakthroughs in this cutting-edge field.
cancel
Showing results for 
Search instead for 
Did you mean: 

Accessing UC Volumes using pure python (ML) with databricks-connect

KrzysztofPrzyso
New Contributor III

Hi All,

In my organization, we use Databricks Connect and VS Code for data engineering purposes. This setup is working great, especially for:

- Debugging
- Unit tests
- GitHub Copilot
- Reusable modules and custom libraries

In my view, the developer experience here is significantly better than in notebooks. It's crucial for us that the same code can run on both Databricks Connect/IDE and the Databricks cluster.

Following best practices, we use Unity Catalog for governance and access control, which has been working well, including with UC Volumes to manage access to unstructured data.

In the machine learning, AI, and LLM world, there is a trend towards using pure Python instead of PySpark.

The problem I am facing is accessing files using pure Python (for ML purposes) via Databricks Connect and UC Volumes. This works without issues with PySpark.

Ideally, I would like to avoid:

- Having different code for the cluster and Databricks Connect
- Needing to copy files between Volumes and the local machine (e.g., using the SDK dbutils)

On the cluster/notebook, I can directly open and read files from UC Volumes. I would like to have the same capability in Databricks Connect without requiring additional workarounds.

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group