cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to access UnityCatalog's Volume inside Databricks App?

noklamchan
New Contributor II

I am more familiar with DBFS, which seems to be replaced by UnityCatalog Volume now. When I create a Databricks App, it allowed me to add resource to pick UC volume. How do I actually access the volume inside the app? 

I cannot find any example, the app template did have example to use UC to read Delta table, but not volume. My app need to access some relatively large assets (a few hundred MBs file). Is there some code snippet to show how to access the volume? I checked `/Volumes` and `dbutils` are not available inside the app.

 

#UnityCatalog #Databricks App

4 REPLIES 4

jameshughes
Contributor II

The following code snippets should be sufficient, assuming you have listed the correct dependencies in your App's requirements.txt file.  You would just need to swap in your catalog and schema names below and then point to the correct file or iterate through all files in a directory.

df = spark.read.csv('/Volumes/catalogname/schemaname/files/sourcedata/FileName.csv', header=False, inferSchema=True)
df.show()

files = dbutils.fs.ls('dbfs:/Volumes/catalogname/schemaname/files/sourcedata/')

for file in files:
    print(file.name)

 

noklamchan
New Contributor II

As I mention, I am not reading a table, so Spark is not the right fit here (plus I don't want to included spark as dependencies to read a csv either). I also don't have dbutils.

I found this works:
```

cfg = Config() #This is available inside app

w = WorkspaceClient(host=cfg.host, token=cfg.token)
volume_path = "some_path"
response = w.files.download(volume_path + "file_name")
```

I am just not sure if this is the right way to do that, it would be great if the docs explains clearly what Databricks App have access to. 

Hola, de dónde extraes el Config y el WorkspaceCliente?

 

nayan_wylde
Esteemed Contributor

Apps don’t mount /Volumes and don’t ship with dbutils. So os.listdir('/Volumes/...') or dbutils.fs.ls(...) won’t work inside an App. Use the Files API or Databricks SDK instead to read/write UC Volume files, then work on a local copy.

Code using Python SDK to read and download file.

# requirements.txt
# databricks-sdk>=0.50  (or your pinned version)

import os, tempfile, shutil
from databricks.sdk import WorkspaceClient

# If you used App resources + valueFrom, fetch from env; otherwise hardcode the 3-level name.
CATALOG  = os.environ.get("ASSETS_CATALOG",  "main")
SCHEMA   = os.environ.get("ASSETS_SCHEMA",   "default")
VOLUME   = os.environ.get("ASSETS_VOLUME",   "assets")

# The absolute UC Volumes path to your asset:
volume_path = f"/Volumes/{CATALOG}/{SCHEMA}/{VOLUME}/models/weights.bin"

w = WorkspaceClient()  # in Apps, credentials are auto-injected

# Stream the file down to local disk to avoid loading hundreds of MB into RAM
resp = w.files.download(volume_path)   # Files API via SDK
local_tmp = os.path.join(tempfile.gettempdir(), os.path.basename(volume_path))

with open(local_tmp, "wb") as out:
    # resp.contents is a file-like object; copy in 1MiB chunks
    shutil.copyfileobj(resp.contents, out, length=1024*1024)

# Now use your local file with standard libs (torch, PIL, etc.)

----------------------------------------------------------------------

code to list files in volume

import json
from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

dir_path = f"/Volumes/{CATALOG}/{SCHEMA}/{VOLUME}/models/"
# Call Files API: GET /api/2.0/fs/directories{directory_path}
r = w.api_client.do(
    "GET",
    f"/api/2.0/fs/directories{dir_path}",
    # you can pass query params like {"page_size": 1000, "page_token": "..."} if needed
)

entries = r.get("contents", [])
for e in entries:
    print(e["name"], "(dir)" if e.get("is_directory") else f"({e.get('file_size', 0)} bytes)")

--------------------------------------------------

Code to upload file in volume.

from databricks.sdk import WorkspaceClient
import io

w = WorkspaceClient()
target = f"/Volumes/{CATALOG}/{SCHEMA}/{VOLUME}/incoming/config.json"

data = io.BytesIO(b'{"hello":"world"}')
w.files.upload(target, data, overwrite=True)