cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Accessing Databricks Volumes from a Serving Endpoint Using a Custom Model Class in Unity Catalog

VELU1122
New Contributor II

Hi everyone,

I’m looking for accessing Unity Catalog (UC) Volumes from a Databricks Serving Endpoint. Here’s my current setup:

  • I have a custom AI model class for inference, which I logged into Unity Catalog using mlflow.pyfunc.log_model.
  • I’ve created a Serving Endpoint for this model.

Challenges:

  1. When trying to access UC Volumes directly from my custom class during inference, I get a "No such file or directory" error.
  2. I attempted to mount the UC Volumes within the custom class using dbutils.fs.mount, but when logging the model (mlflow.pyfunc.log_model), I encountered an error that dbutils can’t be used in the Spark environment.

Question:

Since the Serving Endpoint runs in an isolated environment, how can I access Unity Catalog Volumes from within my custom model class during inference?

Any guidance on solving this issue or alternative methods to access UC Volumes from a Serving Endpoint would be greatly appreciated.

Thanks in advance


3 REPLIES 3

VELU1122
New Contributor II

Additionally, I log the model as shown below, with MicrosoftResnet50Model being my custom inference class with load_context and predict methods:
with mlflow.start_run():
model_info = mlflow.pyfunc.log_model(
REGISTERED_MODEL_NAME,
python_model=MicrosoftResnet50Model(),
input_example=api_input_example,
artifacts={"model_path": MODEL_PATH},
pip_requirements=[
f"transformers=={transformers.__version__}",
"torch==2.0.1"
],
signature=signature,
registered_model_name=f"{CATALOG}.{SCHEMA}.{REGISTERED_MODEL_NAME}"
)

Lloetters
New Contributor II

Hey  VELU1122,

did you find a solution for it. We are struggling with the same problem currently. 

Thanks 

Lukas Lötters
Data Scientist @ ORALYIS

Louis_Frolio
Databricks Employee
Databricks Employee

Greetings @VELU1122 ,  you’re correct that the Databricks Model Serving container is isolated, so you can’t rely on cluster-only affordances like mounts or executor-distributed file utilities. The reliable way to read from Unity Catalog (UC) Volumes in a serving endpoint is to use the Databricks Files API / SDK with an endpoint-injected credential, and address files by their UC Volumes path, for example /Volumes/<catalog>/<schema>/<volume>/<relative_path>.

 

What works from Model Serving

  • Use the Files REST API or the Databricks SDK (WorkspaceClient.files) to list, download, and upload files in UC Volumes with paths like /Volumes/<catalog>/<schema>/<volume>/.... This is supported for managing and reading files directly from Volumes, and avoids the need for dbutils or mounts inside the serving container.
  • Inject credentials into the serving container using environment variables backed by Databricks secrets. Define DATABRICKS_HOST as plain text and DATABRICKS_TOKEN (or use OAuth for a service principal) as a secret in the endpoint config; then use the SDK to call the Files API at inference time.
  • Ensure the endpoint’s identity (the user or service principal that created the endpoint) has UC privileges (for example, READ FILES on the Volume). Endpoint identity is fixed at creation and is used for UC access checks; if it lacks privileges, recreate the endpoint under an identity that has access.
  • If your Volume is external, you can also access data via cloud URIs (s3://, abfss://, gs://) as part of Volumes GA, but you still must provide cloud credentials in the serving container (for example via an instance profile on the endpoint or provider-specific auth). For many scenarios, the Files API / SDK is simpler and keeps governance in UC.

What doesn’t work in Model Serving

  • Avoid dbutils.fs.mount or relying on FUSE-style local paths in serving containers; use Files API / SDK instead. Model Serving doesn’t run notebook executors and doesn’t support the same dbutils semantics; Volumes are intended for path-based governance and programmatic access via APIs and POSIX-like paths, not runtime mounts in serving.

Recommended pattern 1. Configure environment variables with secrets on your endpoint:

  • In Serving UI or via REST/SDK, add: * DATABRICKS_HOST: https://<your-workspace-url> (plain text). * DATABRICKS_TOKEN: {{secrets/<scope>/<key>}} (secret). * Alternatively, use OAuth M2M for a service principal and inject DATABRICKS_CLIENT_ID / DATABRICKS_CLIENT_SECRET and fetch short-lived tokens at runtime, then call the Files API. This avoids PATs and is recommended for unattended endpoints.
  1. From your custom python_model class, read files with the SDK: ```python import os import io from databricks.sdk import WorkspaceClient
class MicrosoftResnet50Model(mlflow.pyfunc.PythonModel): def load_context(self, context): host = os.environ["DATABRICKS_HOST"] token = os.environ["DATABRICKS_TOKEN"] # or build OAuth client and fetch an access token self.w = WorkspaceClient(host=host, token=token)
def _read_volume_file(self, path: str) -> bytes:
    # path like "/Volumes/<catalog>/<schema>/<volume>/images/cat.jpg"
    resp = self.w.files.download(path)  # returns a response with .contents (bytes)
    return resp.contents
def predict(self, context, model_input): # Example: model_input contains file names relative to your volume catalog, schema, volume = context.artifacts.get("uc_volume_ns", ("main", "default", "my_volume")) rel_path = model_input.get("relative_path") # e.g., "images/cat.jpg" volume_path = f"/Volumes/{catalog}/{schema}/{volume}/{rel_path}" # must include the volume name img_bytes = self._read_volume_file(volume_path) # ... open bytes with PIL, transform, run inference, return outputs ... # return predictions ```
  1. Pass any constant namespace values or paths you need as artifacts/params when logging the model or as endpoint environment variables, so your class can construct the /Volumes/... path at runtime.
  2. If you truly need direct cloud access (for external Volumes), configure the endpoint with an instance profile or provider credentials and use the cloud SDK/URI. Otherwise, prefer the Files API route for simplicity and governance consistency.

Why the errors occur

  • “No such file or directory” happens when using local filesystem paths that aren’t available in the serving container; UC Volume access in Serving should go through the Files API/SDK and Volume paths, not mounts.
  • dbutils is notebook/cluster-bound; Model Serving supports environment variables and secrets injection for external access, not dbutils mounts. Use the Files API / SDK instead of dbutils in serving.

Alternative strategies

  • If the files are static assets required for inference (labels, templates, small configs), bundle them as MLflow model artifacts at log time and access them via context.artifacts rather than reaching out to Volumes during inference. This reduces I/O and removes external dependencies at serving time.
  • For high-throughput batch scenarios that require broad data scans, consider Jobs on UC-enabled compute reading from Volumes with Spark, and write outputs to tables; use Model Serving for low-latency point queries. Volumes are fully supported across Spark, SQL, dbutils, REST, CLI, and SDKs, so you can mix patterns as needed.

Endpoint config snippets

Create or update endpoint with secret-based env vars: json { "name": "uc-model-endpoint", "config": { "served_entities": [ { "entity_name": "myCatalog.mySchema.myModel", "entity_version": "1", "workload_size": "Small", "scale_to_zero_enabled": true, "environment_vars": { "DATABRICKS_HOST": "https://<workspace-url>", "DATABRICKS_TOKEN": "{{secrets/my_scope/my_token_key}}" } } ] } }
Key references if you want to dig deeper: * Files API and SDK examples for Volumes, including REST paths: /api/2.0/fs/files/Volumes/... and SDK usage in WorkspaceClient.files.
  • Volumes GA capabilities and cloud URI access for external Volumes.
  • Volumes object model, path rules, and limitations (must include the volume name in paths, intended for path-based access).
  • Serving endpoint identity and UC access implications.
Hope this helps, Louis.