cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Generative AI
Explore discussions on generative artificial intelligence techniques and applications within the Databricks Community. Share ideas, challenges, and breakthroughs in this cutting-edge field.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Error when logging artifact OSError: [Errno 5] Input/output error: '/dbfs/Volumes'

Sangamswadik
New Contributor III

Hi, I'm building an streamlit application on databricks apps, where user can upload some data , and I run an  LLM model and return results. There, I want to log an artifact to a volume. I'm following this documentation 

https://docs.databricks.com/aws/en/mlflow/experiments 

Here's my code 

catalog = "testing"
schema = "model-data"
volume = "training-data"
experiment_name =   "/Shared/test_run"
a=5
b=7

artifact_location = f"dbfs:/Volumes/{catalog}/{schema}/{volume}/mlflow_artifacts"

if mlflow.get_experiment_by_name(experiment_name) is None:
    mlflow.create_experiment(name=experiment_name, artifact_location=artifact_location)
mlflow.set_experiment(experiment_name)


csv_path = f"/Volumes/{catalog}/{schema}/{volume}/mmm_data.csv"
df = pd.read_csv(csv_path)
print("Sample data:")
print(df.head(3))

# --- Start an MLflow run ---
with mlflow.start_run() as run:
    result = a + b

    # Log parameters and metric to MLflow
    mlflow.log_param("a", a)
    mlflow.log_param("b", b)
    mlflow.log_metric("sum_result", result)
    
    # --- Save the DataFrame as a CSV file locally (temporary location) ---
    temp_csv_path = "/tmp/mmm_data.csv"
    df.to_csv(temp_csv_path, index=False)
    
    # Log the CSV file as an artifact.
    # this artifact will be stored under artifact_location/data/
    mlflow.log_artifact(temp_csv_path, artifact_path="data")

 I'm getting this error when I run this

OSError: [Errno 5] Input/output error: '/dbfs/Volumes'
File <command-4809454479493761>, line 16
     12 df.to_csv(temp_csv_path, index=False)
     14 # Log the CSV file as an artifact.
     15 # this artifact will be stored under artifact_location/data/
---> 16 mlflow.log_artifact(temp_csv_path, artifact_path="data")

I think it is some issue with permissions but unable to figure this out, any help would be greatly appreciated!

1 ACCEPTED SOLUTION

Accepted Solutions

mark_ott
Databricks Employee
Databricks Employee

The error

text
OSError: [Errno 5] Input/output error: '/dbfs/Volumes'

occurs because Databricks Apps (including Streamlit apps running on Databricks) currently do not have direct write access to /dbfs/Volumes for artifact logging via MLflow within the app execution environment.โ€‹

Here are the main causes and solutions you can apply:


Why It Happens

  1. Permissions issue with Volumes

    • Unity Catalog Volumes require specific privileges: USE CATALOG, USE SCHEMA, and USE VOLUME.

    • Your Databricks App likely runs under a service principal or app environment that does not have write access to the target volume.โ€‹

  2. DBFS and Volume availability in Databricks Apps

    • In the Databricks Apps runtime, paths like /dbfs/Volumes/... may not be mounted or accessible as standard local paths for direct file operations. This leads to input/output errors even when the volume exists.โ€‹

  3. Custom artifact location

    • When using a custom artifact_location (e.g., dbfs:/Volumes/...), you must ensure MLflow supports that location and your client version is โ‰ฅโ€ฏ2.15.0. Otherwise, MLflow may misinterpret the artifact root and fail to write files.โ€‹


Fix: Recommended Approaches

Option 1: Use MLflow-managed storage

Remove the custom artifact location and let MLflow assign the default tracked location:

python
experiment_name = "/Shared/test_run" mlflow.set_experiment(experiment_name)

Databricks-managed artifact storage automatically applies proper permissions (stored under dbfs:/databricks/mlflow-tracking/โ€ฆ).โ€‹

Option 2: Use a Unity Catalog Volume with proper permissions

If you must use a specific Volume:

  1. Ensure your catalog/schema/volume exist and grant privileges:

    sql
    GRANT USE CATALOG ON CATALOG testing TO `your_user_or_service_principal`; GRANT USE SCHEMA ON SCHEMA testing.model-data TO `your_user_or_service_principal`; GRANT USE VOLUME ON VOLUME testing.model-data.training-data TO `your_user_or_service_principal`;
  2. Use a recent MLflow version (โ‰ฅโ€ฏ2.15.0):

    python
    %pip install --upgrade mlflow
  3. Set artifact_location as:

    python
    artifact_location = f"dbfs:/Volumes/{catalog}/{schema}/{volume}/mlflow_artifacts"

Option 3: Log to temporary local storage, then move manually

Log artifacts to a local temp directory like /tmp and copy them afterward:

python
mlflow.log_artifact(temp_csv_path, artifact_path="data")

Follow with:

python
dbutils.fs.cp("file:/tmp/mmm_data.csv", f"dbfs:/Volumes/{catalog}/{schema}/{volume}/mmm_data.csv")

Summary of Key Recommendations

Cause Fix
Lack of permissions on Unity Catalog volume Grant USE VOLUME permissions
Inaccessible /dbfs within Databricks Apps Use dbutils.fs.cp or default MLflow artifact store
Outdated MLflow version Upgrade to โ‰ฅโ€ฏ2.15.0
Custom artifact location not supported Prefer default MLflow-managed locations
 
 

Following these steps will allow your Streamlit app in Databricks to log artifacts without hitting the OSError: [Errno 5] Input/output error.

View solution in original post

1 REPLY 1

mark_ott
Databricks Employee
Databricks Employee

The error

text
OSError: [Errno 5] Input/output error: '/dbfs/Volumes'

occurs because Databricks Apps (including Streamlit apps running on Databricks) currently do not have direct write access to /dbfs/Volumes for artifact logging via MLflow within the app execution environment.โ€‹

Here are the main causes and solutions you can apply:


Why It Happens

  1. Permissions issue with Volumes

    • Unity Catalog Volumes require specific privileges: USE CATALOG, USE SCHEMA, and USE VOLUME.

    • Your Databricks App likely runs under a service principal or app environment that does not have write access to the target volume.โ€‹

  2. DBFS and Volume availability in Databricks Apps

    • In the Databricks Apps runtime, paths like /dbfs/Volumes/... may not be mounted or accessible as standard local paths for direct file operations. This leads to input/output errors even when the volume exists.โ€‹

  3. Custom artifact location

    • When using a custom artifact_location (e.g., dbfs:/Volumes/...), you must ensure MLflow supports that location and your client version is โ‰ฅโ€ฏ2.15.0. Otherwise, MLflow may misinterpret the artifact root and fail to write files.โ€‹


Fix: Recommended Approaches

Option 1: Use MLflow-managed storage

Remove the custom artifact location and let MLflow assign the default tracked location:

python
experiment_name = "/Shared/test_run" mlflow.set_experiment(experiment_name)

Databricks-managed artifact storage automatically applies proper permissions (stored under dbfs:/databricks/mlflow-tracking/โ€ฆ).โ€‹

Option 2: Use a Unity Catalog Volume with proper permissions

If you must use a specific Volume:

  1. Ensure your catalog/schema/volume exist and grant privileges:

    sql
    GRANT USE CATALOG ON CATALOG testing TO `your_user_or_service_principal`; GRANT USE SCHEMA ON SCHEMA testing.model-data TO `your_user_or_service_principal`; GRANT USE VOLUME ON VOLUME testing.model-data.training-data TO `your_user_or_service_principal`;
  2. Use a recent MLflow version (โ‰ฅโ€ฏ2.15.0):

    python
    %pip install --upgrade mlflow
  3. Set artifact_location as:

    python
    artifact_location = f"dbfs:/Volumes/{catalog}/{schema}/{volume}/mlflow_artifacts"

Option 3: Log to temporary local storage, then move manually

Log artifacts to a local temp directory like /tmp and copy them afterward:

python
mlflow.log_artifact(temp_csv_path, artifact_path="data")

Follow with:

python
dbutils.fs.cp("file:/tmp/mmm_data.csv", f"dbfs:/Volumes/{catalog}/{schema}/{volume}/mmm_data.csv")

Summary of Key Recommendations

Cause Fix
Lack of permissions on Unity Catalog volume Grant USE VOLUME permissions
Inaccessible /dbfs within Databricks Apps Use dbutils.fs.cp or default MLflow artifact store
Outdated MLflow version Upgrade to โ‰ฅโ€ฏ2.15.0
Custom artifact location not supported Prefer default MLflow-managed locations
 
 

Following these steps will allow your Streamlit app in Databricks to log artifacts without hitting the OSError: [Errno 5] Input/output error.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now