cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Error when uploading MLFlow artifacts to DBFS

xming
New Contributor II

Hi everyone,

I'm attempting to use MLFlow experiment tracking from a local machine, but I'm encountering difficulties in uploading artifacts.

I've tried a sample code as simple as the following.

import mlflow
import os

os.environ["DATABRICKS_HOST"] = "https://XXXXXX.cloud.databricks.com/"
os.environ["DATABRICKS_TOKEN"] = "dapiXXXXX"

mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("XXXX")

with mlflow.start_run() as run:
    mlflow.log_param("param1", 5)
    mlflow.log_metric("foo", 1, step=0)
    mlflow.log_metric("foo", 2, step=1)
    mlflow.log_metric("foo", 3, step=2)
    mlflow.log_metric("foo", 4, step=3)
    mlflow.log_metric("foo", 5, step=4)
    mlflow.log_artifact("main.py")

This code successfully created a new run in the target MLFlow experiment, and logged the parameters "param1" and metric "foo" correctly. However, it failed to log the artifact and displayed an error message like the following.

mlflow.exceptions.MlflowException: 403 Client Error: Forbidden for url: https://dbstorage-prod-whkxn.s3.ap-southeast-2.amazonaws.com/ws/xxxxxxxxxxxxxxxxx (an AWS presigned URL). Response text: <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>xxxxxxxxxxxxxxxx</RequestId><HostId>xxxxxxxxxxxxx</HostId></Error>

 Do I need any further setting to make artifact logging available?

3 REPLIES 3

BigRoux
Databricks Employee
Databricks Employee

A couple things:

1. If you don't own the mlflow experiment you need ot have edit permissions on the experiment (needed for logging). Default artifact locations in DBFS (`dbfs:/databricks/mlflow-tracking/`) require explicit write permissions

2. The location you are writing to, make sure you have proper entitelemnts to write to that location.

3. Unity Catalog volumes require `USE CATALOG` and `USE VOLUME` privileges (if you are using Unity Catalog).

Hope this helps, Louis.

xming
New Contributor II

Hi,

Thank you for the advice! I managed to upload artifacts by creating a Unity Catalog volume and explicitly setting it as the artifact location.

However, I am still wondering if it is possible to upload artifact to the default DBFS artifact location. How can I grant the explicit write permission to the default location?

BigRoux
Databricks Employee
Databricks Employee

It is considered best practice not to store any production data or assets in DBFS (Databricks File System). The primary reason is that DBFS does not provide robust security controls-anyone with workspace access can potentially access items stored there. Instead, Databricks strongly recommends using Unity Catalog for managing and securing your data and AI assets. Unity Catalog offers centralized access control, fine-grained permissions, and enhanced auditing capabilities, making it the preferred solution for production workloads.

DBFS is now considered a legacy storage option and both DBFS mounts and root storage are deprecated due to security risks and their incompatibility with Unity Catalog’s governance model. While there is no official deprecation date yet, it is advisable to migrate your production assets to Unity Catalog Volumes to ensure future compatibility and security.

In summary, use Unity Catalog for all production data and AI assets, and avoid storing anything critical in DBFS.

Hope this help, Big Roux.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now