02-14-2025 09:10 AM
Hi,
I am working on a CI CD blueprint for developers, using which developers can create their bundle for jobs / workflows and then create a volume to which they will upload a wheel file or a jar file which will be used as a dependency in their notebook. I don't want to use Python SDK or any such programmatic way to upload that file to a volume as described here https://learn.microsoft.com/en-us/azure/databricks/files/volumes#manage-files-in-volumes-from-extern.... I want the file to be uploaded to the volume as part of the bundle deployment process so that when the bundle runs, it can reference that file in the volume at runtime.
Thanks in advance,
Venu
05-20-2025 07:33 AM
Hi Venugopal,
I am having similar requirement. Did you get any solution to handle this?
09-15-2025 05:37 AM
Use Databricks Asset Bundle with a deployment job that leverages shell commands to copy artifacts from workspace bundle paths to Unity Catalog Volumes.
bundle: name: artifact-deployer targets: dev: workspace: host: "https://your-workspace.cloud.databricks.com" variables: catalog: "your_catalog" schema: "your_schema" volume: "artifacts" resources: jobs: deploy_artifacts: name: "Deploy Artifacts to Volume" tasks: - task_key: "copy_to_volume" notebook_task: notebook_path: "./notebooks/deploy" base_parameters: catalog: ${var.catalog} schema: ${var.schema} volume: ${var.volume}
# Create widgets for parameters dbutils.widgets.text("catalog", "") dbutils.widgets.text("schema", "") dbutils.widgets.text("volume", "") # Get parameter values catalog_name = dbutils.widgets.get("catalog") schema_name = dbutils.widgets.get("schema") volume_name = dbutils.widgets.get("volume") print(f"Target: {catalog_name}.{schema_name}.{volume_name}")
# Create Unity Catalog Volume spark.sql(f"CREATE VOLUME IF NOT EXISTS {catalog_name}.{schema_name}.{volume_name}") # Create directory structure dbutils.fs.mkdirs(f"/Volumes/{catalog_name}/{schema_name}/{volume_name}/wheels/") dbutils.fs.mkdirs(f"/Volumes/{catalog_name}/{schema_name}/{volume_name}/notebooks/") print("Volume and directories created successfully")
# Get current user and define paths username = dbutils.notebook.entry_point.getDbutils().notebook().getContext().userName().get() bundle_path = f"/Workspace/Users/{username}/.bundle/your_repo_name/dev/files" volume_path = f"/Volumes/{catalog_name}/{schema_name}/{volume_name}" # Set environment variables for shell access import osos.environ['BUNDLE_PATH'] = bundle_pathos.environ['VOLUME_PATH'] = volume_path print(f"Bundle Path: {bundle_path}") print(f"Volume Path: {volume_path}")
%%sh echo "Copying from: $BUNDLE_PATH" echo "Copying to: $VOLUME_PATH" # Copy notebooks if [ -d "$BUNDLE_PATH/notebooks" ]; then cp -r "$BUNDLE_PATH/notebooks/"* "$VOLUME_PATH/notebooks/" echo "Notebooks copied successfully" else echo "Notebooks directory not found at $BUNDLE_PATH/notebooks" fi # Copy Python wheels if [ -d "$BUNDLE_PATH/dist" ]; then cp "$BUNDLE_PATH/dist/"*.whl "$VOLUME_PATH/wheels/" 2>/dev/null && \ echo "Python wheels copied successfully" || \ echo "No wheel files found in $BUNDLE_PATH/dist" else echo "Dist directory not found at $BUNDLE_PATH/dist" fi # Verify deployment echo "" echo "Deployment Summary:" echo "Notebooks in volume:" find "$VOLUME_PATH/notebooks/" -type f 2>/dev/null | wc -l || echo "0" echo "Wheels in volume:" find "$VOLUME_PATH/wheels/" -type f 2>/dev/null | wc -l || echo "0"
You can replace whl with JAR file. Reason for not using programatically is the shell commands can access /Workspace paths directly, while Python file operations and dbutils.fs cannot. We still need to tweak the shell script a bit to fetch latest whl version which is an enhancement, works for now! And, yeah, this simple job can be run on serverless, so the deployment is instant (< 2 min) and doesnt need 4-5 min cluster bootup time to wait, so no need to use job cluster here.
09-15-2025 05:42 AM
With this setup, users who are entitled to access the catalog will have the access to use the volume, if permissions are set in this way. And, users will be able to utilize the notebook and we need to provide documentation either to clone the notebook and run or directly run (depends on usecase). But yeah, most importantly developers will be able to reference to the volume uploaded to the catalog and install to specific job or adhoc clusters.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now