- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-29-2025 02:17 AM - edited 03-29-2025 02:40 AM
Hi, I'm building an streamlit application on databricks apps, where user can upload some data , and I run an LLM model and return results. There, I want to log an artifact to a volume. I'm following this documentation
https://docs.databricks.com/aws/en/mlflow/experiments
Here's my code
catalog = "testing"
schema = "model-data"
volume = "training-data"
experiment_name = "/Shared/test_run"
a=5
b=7
artifact_location = f"dbfs:/Volumes/{catalog}/{schema}/{volume}/mlflow_artifacts"
if mlflow.get_experiment_by_name(experiment_name) is None:
mlflow.create_experiment(name=experiment_name, artifact_location=artifact_location)
mlflow.set_experiment(experiment_name)
csv_path = f"/Volumes/{catalog}/{schema}/{volume}/mmm_data.csv"
df = pd.read_csv(csv_path)
print("Sample data:")
print(df.head(3))
# --- Start an MLflow run ---
with mlflow.start_run() as run:
result = a + b
# Log parameters and metric to MLflow
mlflow.log_param("a", a)
mlflow.log_param("b", b)
mlflow.log_metric("sum_result", result)
# --- Save the DataFrame as a CSV file locally (temporary location) ---
temp_csv_path = "/tmp/mmm_data.csv"
df.to_csv(temp_csv_path, index=False)
# Log the CSV file as an artifact.
# this artifact will be stored under artifact_location/data/
mlflow.log_artifact(temp_csv_path, artifact_path="data")I'm getting this error when I run this
OSError: [Errno 5] Input/output error: '/dbfs/Volumes'
File <command-4809454479493761>, line 16
12 df.to_csv(temp_csv_path, index=False)
14 # Log the CSV file as an artifact.
15 # this artifact will be stored under artifact_location/data/
---> 16 mlflow.log_artifact(temp_csv_path, artifact_path="data")I think it is some issue with permissions but unable to figure this out, any help would be greatly appreciated!
- Labels:
-
Generation AI
-
MlFlow
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-24-2025 07:21 AM
The error
OSError: [Errno 5] Input/output error: '/dbfs/Volumes'
occurs because Databricks Apps (including Streamlit apps running on Databricks) currently do not have direct write access to /dbfs/Volumes for artifact logging via MLflow within the app execution environment.
Here are the main causes and solutions you can apply:
Why It Happens
-
Permissions issue with Volumes
-
Unity Catalog Volumes require specific privileges:
USE CATALOG,USE SCHEMA, andUSE VOLUME. -
Your Databricks App likely runs under a service principal or app environment that does not have write access to the target volume.
-
-
DBFS and Volume availability in Databricks Apps
-
In the Databricks Apps runtime, paths like
/dbfs/Volumes/...may not be mounted or accessible as standard local paths for direct file operations. This leads to input/output errors even when the volume exists.
-
-
Custom artifact location
-
When using a custom
artifact_location(e.g.,dbfs:/Volumes/...), you must ensure MLflow supports that location and your client version is ≥ 2.15.0. Otherwise, MLflow may misinterpret the artifact root and fail to write files.
-
Fix: Recommended Approaches
Option 1: Use MLflow-managed storage
Remove the custom artifact location and let MLflow assign the default tracked location:
experiment_name = "/Shared/test_run"
mlflow.set_experiment(experiment_name)
Databricks-managed artifact storage automatically applies proper permissions (stored under dbfs:/databricks/mlflow-tracking/…).
Option 2: Use a Unity Catalog Volume with proper permissions
If you must use a specific Volume:
-
Ensure your catalog/schema/volume exist and grant privileges:
sqlGRANT USE CATALOG ON CATALOG testing TO `your_user_or_service_principal`; GRANT USE SCHEMA ON SCHEMA testing.model-data TO `your_user_or_service_principal`; GRANT USE VOLUME ON VOLUME testing.model-data.training-data TO `your_user_or_service_principal`; -
Use a recent MLflow version (≥ 2.15.0):
python%pip install --upgrade mlflow -
Set
artifact_locationas:pythonartifact_location = f"dbfs:/Volumes/{catalog}/{schema}/{volume}/mlflow_artifacts"
Option 3: Log to temporary local storage, then move manually
Log artifacts to a local temp directory like /tmp and copy them afterward:
mlflow.log_artifact(temp_csv_path, artifact_path="data")
Follow with:
dbutils.fs.cp("file:/tmp/mmm_data.csv", f"dbfs:/Volumes/{catalog}/{schema}/{volume}/mmm_data.csv")
Summary of Key Recommendations
| Cause | Fix |
|---|---|
| Lack of permissions on Unity Catalog volume | Grant USE VOLUME permissions |
Inaccessible /dbfs within Databricks Apps |
Use dbutils.fs.cp or default MLflow artifact store |
| Outdated MLflow version | Upgrade to ≥ 2.15.0 |
| Custom artifact location not supported | Prefer default MLflow-managed locations |
Following these steps will allow your Streamlit app in Databricks to log artifacts without hitting the OSError: [Errno 5] Input/output error.