cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to upload a file to Unity catalog volume using databricks asset bundles

Venugopal
New Contributor III

Hi,

I am working on a CI CD blueprint for developers, using which developers can create their bundle for jobs / workflows and then create a volume to which they will upload a wheel file or a jar file which will be used as a dependency in their notebook. I don't want to use Python SDK or any such programmatic way to upload that file to a volume as described here https://learn.microsoft.com/en-us/azure/databricks/files/volumes#manage-files-in-volumes-from-extern.... I want the file to be uploaded to the volume as part of the bundle deployment process so that when the bundle runs, it can reference that file in the volume at runtime.

Thanks in advance,

Venu

3 REPLIES 3

binitchowdhary
New Contributor II

Hi Venugopal, 

I am having similar requirement. Did you get any solution to handle this?

chanukya-pekala
Contributor II

Deploying Databricks Asset Bundle Artifacts to Unity Catalog Volumes

Use Databricks Asset Bundle with a deployment job that leverages shell commands to copy artifacts from workspace bundle paths to Unity Catalog Volumes.

Configuration

databricks.yml

bundle:
  name: artifact-deployer
targets:
  dev:
    workspace:
      host: "https://your-workspace.cloud.databricks.com"
    
    variables:
      catalog: "your_catalog"
      schema: "your_schema"       volume: "artifacts"
    
    resources:
      jobs:
        deploy_artifacts:
          name: "Deploy Artifacts to Volume"
          tasks:
            - task_key: "copy_to_volume"
              notebook_task:
                notebook_path: "./notebooks/deploy"
                base_parameters:
                  catalog: ${var.catalog}
                  schema: ${var.schema}
                  volume: ${var.volume}

Deployment Notebook (notebooks/deploy.ipynb)

Cell 1: Setup Parameters

# Create widgets for parameters
dbutils.widgets.text("catalog", "")
dbutils.widgets.text("schema", "") dbutils.widgets.text("volume", "")

# Get parameter values
catalog_name = dbutils.widgets.get("catalog")
schema_name = dbutils.widgets.get("schema")
volume_name = dbutils.widgets.get("volume")

print(f"Target: {catalog_name}.{schema_name}.{volume_name}")

Cell 2: Create Volume and Directory Structure

# Create Unity Catalog Volume
spark.sql(f"CREATE VOLUME IF NOT EXISTS {catalog_name}.{schema_name}.{volume_name}")

# Create directory structure
dbutils.fs.mkdirs(f"/Volumes/{catalog_name}/{schema_name}/{volume_name}/wheels/")
dbutils.fs.mkdirs(f"/Volumes/{catalog_name}/{schema_name}/{volume_name}/notebooks/")

print("Volume and directories created successfully")

Cell 3: Define Paths and Environment Variables

# Get current user and define paths
username = dbutils.notebook.entry_point.getDbutils().notebook().getContext().userName().get()
bundle_path = f"/Workspace/Users/{username}/.bundle/your_repo_name/dev/files"
volume_path = f"/Volumes/{catalog_name}/{schema_name}/{volume_name}"

# Set environment variables for shell access
import osos.environ['BUNDLE_PATH'] = bundle_pathos.environ['VOLUME_PATH'] = volume_path
print(f"Bundle Path: {bundle_path}")
print(f"Volume Path: {volume_path}")

Cell 4: Copy Artifacts Using Shell Commands

%%sh
echo "Copying from: $BUNDLE_PATH"
echo "Copying to: $VOLUME_PATH"

# Copy notebooks
if [ -d "$BUNDLE_PATH/notebooks" ]; then
    cp -r "$BUNDLE_PATH/notebooks/"* "$VOLUME_PATH/notebooks/"
    echo "Notebooks copied successfully"
else
    echo "Notebooks directory not found at $BUNDLE_PATH/notebooks"
fi

# Copy Python wheels
if [ -d "$BUNDLE_PATH/dist" ]; then
    cp "$BUNDLE_PATH/dist/"*.whl "$VOLUME_PATH/wheels/" 2>/dev/null && \
    echo "Python wheels copied successfully" || \
    echo "No wheel files found in $BUNDLE_PATH/dist"
else
    echo "Dist directory not found at $BUNDLE_PATH/dist"
fi

# Verify deployment
echo ""
echo "Deployment Summary:"
echo "Notebooks in volume:"
find "$VOLUME_PATH/notebooks/" -type f 2>/dev/null | wc -l || echo "0"

echo "Wheels in volume:" find "$VOLUME_PATH/wheels/" -type f 2>/dev/null | wc -l || echo "0"

Key Technical Points

  1. Parameter Flow: databricks.yml variables → base_parameters → notebook widgets → Python variables
  2. Path Access: Bundle artifacts at /Workspace/Users/{user}/.bundle/{repo}/dev/files/ are accessible via shell but not dbutils.fs
  3. Environment Bridge: os.environ passes Python variables to shell commands
  4. Volume Paths: Unity Catalog Volumes accessible at /dbfs/Volumes/{catalog}/{schema}/{volume}/

You can replace whl with JAR file. Reason for not using programatically is the shell commands can access /Workspace paths directly, while Python file operations and dbutils.fs cannot. We still need to tweak the shell script a bit to fetch latest whl version which is an enhancement, works for now! And, yeah, this simple job can be run on serverless, so the deployment is instant (< 2 min) and doesnt need 4-5 min cluster bootup time to wait, so no need to use job cluster here. 

Chanukya

chanukya-pekala
Contributor II

With this setup, users who are entitled to access the catalog will have the access to use the volume, if permissions are set in this way. And, users will be able to utilize the notebook and we need to provide documentation either to clone the notebook and run or directly run (depends on usecase). But yeah, most importantly developers will be able to reference to the volume uploaded to the catalog and install to specific job or adhoc clusters. 

Chanukya