โ02-21-2024 01:03 AM
I am trying to find a way to locally download the model artifacts that build a chatbot chain registered with MLflow in Databricks, so that I can preserve the whole structure (chain -> model -> steps -> yaml & pkl files).
There is a mention in a contributed article, but it is not clear what `local_dir` really represents (inside dbfs, in the volume, on the local computer?) and what format it is supposed to have.
Maybe somebody knows the answer ๐
Thx
โ02-26-2024 06:03 AM
OK, eventually I found a solution. I write it below, whether somebody will need it. Basically, if in the download_artifacts method the local directory is an existing and accessible one in the DBFS, the process will work as expected.
import os
# Consider you have the artifacts in "/dbfs/databricks/mlflow-tracking/<id>/<run_id>/artifacts/chain"
client = MlflowClient()
local_dir = "/dbfs/FileStore/mydir1" # existing and accessible DBFS folder
run_id = "<run_id>"
local_path = client.download_artifacts(run_id, "chain", local_dir)
print("Artifacts downloaded in: {}".format(local_path))
# expected output print message: Artifacts downloaded in: /dbfs/FileStore/mydir1/chain
โ02-21-2024 01:26 AM
Hi @Octavian1, When working with MLflow in Databricks, you can download model artifacts to your local storage using the client.download_artifacts
method.
Let me explain how it works:
By default, MLflow saves artifacts to an artifact store URI during an experiment. The artifact store URI follows a structure like /dbfs/databricks/mlflow-tracking/<experiment-id>/<run-id>/artifacts/
. However, this artifact store is managed by MLflow, and you cannot directly download artifacts from it.
To download artifacts, you must use the client.download_artifacts
method. This method allows you to copy artifacts from the artifact store to another storage location of your choice. You specify the local directory (local_dir
) where you want to store the downloaded artifacts.
Hereโs an example code snippet in Python that demonstrates how to download MLflow artifacts from a specific run and store them locally:
import mlflow
import os
from mlflow.tracking import MlflowClient
# Initialize MLflow client
client = MlflowClient()
# Specify the local directory where you want to store artifacts
local_dir = "<local-path-to-store-artifacts>"
# Create the local directory if it doesn't exist
if not os.path.exists(local_dir):
os.mkdir(local_dir)
# Assume you have logged an artifact named "features.txt" during an MLflow run
features = "rooms, zipcode, median_price, school_rating, transport"
with open("features.txt", 'w') as f:
f.write(features)
# Create a sample MLflow run and log the artifact "features.txt"
with mlflow.start_run() as run:
mlflow.log_artifact("features.txt", artifact_path="features")
# Download the artifact to local storage
local_path = client.download_artifacts(<run-id>, "features", local_dir)
print(f"Artifacts downloaded in: {local_dir}")
After downloading the artifacts to your local storage, you can further copy or move them to an external filesystem or a mount point using standard tools. For example:
%scala dbutils.fs.cp(local_dir, "<filesystem://path-to-store-artifacts>")
.shutil.move(local_dir, "/dbfs/mnt/<path-to-store-artifacts>")
.Remember to replace <local-path-to-store-artifacts>
it with your desired local directory and <run-id>
with the actual run ID of your specified MLflow run. This way, you can preserve the entire structure of your chatbot chain, including models, steps, and associated files. ๐ค๐ฆ
For more details, you can refer to the official Databricks documentation on downloading MLflow artifacts. If you have any further questions, feel free to ask! ๐
โ02-21-2024 01:51 AM
Hi @Kaniz_Fatma and thank you for your answer.
So I have run this piece of code from a Databricks notebook within my workspace.
Literally:
import os
# Consider I have the artifacts in "/dbfs/databricks/mlflow-tracking/<id>/<run_id>/artifacts/chain"
client = MlflowClient()
local_dir = "mydir"
os.makedirs(local_dir, exist_ok=True)
run_id = "<run_id>"
local_path = client.download_artifacts(run_id, "chain", local_dir)
print("Artifacts downloaded in: {}".format(local_dir))
It runs OK, with the expected output:
Artifacts downloaded in: mydir
The question is, where was mydir created? I cannot find it anywhere (workspace, dbfs, volume...)
Thank you!
โ02-21-2024 01:56 AM
Hi @Octavian1, The directory โmydirโ that you specified in your code is created within the Databricks workspace. However, itโs important to understand that this directory is not directly accessible from your local machine or the DBFS (Databricks File System).
Let me explain further:
Workspace Location:
os.makedirs(local_dir, exist_ok=True)
in your Databricks notebook, it is created within the Databricks workspace.Accessing Artifacts:
client.download_artifacts
are stored in the Databricks artifact store, which is managed by MLflow."chain"
) corresponds to the artifact path within the run identified by <run_id>
.Viewing Artifacts:
Copying or Moving Artifacts:
dbutils.fs.cp(local_dir, "file:/mnt/<mount-point>/<path-to-store-artifacts>")
dbutils.fs.mv(local_dir, "/dbfs/mnt/<path-to-store-artifacts>")
Remember that the โmydirโ directory is a temporary workspace location within Databricks, and youโll need to take additional steps to make the artifacts accessible in other environments. If you have specific requirements for where you want to store the artifacts, consider using an appropriate mount point or external storage location. ๐๐๐
For more details, you can refer to the Databricks documentation on interacting with workspace files1.
โ02-21-2024 02:20 AM
Hi @Kaniz_Fatma and thanks again.
So in my example the artifacts have been downloaded to the local_path, which is /databricks/driver/mydir/chain
From your second explanation at point 1., it turns out that also this directory is not directly visible/accessible (The directory โmydirโ exists within the Databricks workspace, but itโs not visible in your local filesystem or DBFS.)
It seems then that the only way to get them is applying paragraph 4., so I proceeded with:
dbutils.fs.mv(local_dir, "/dbfs/mnt/mypath")
and also tried
dbutils.fs.mv(local_path, "/dbfs/mnt/mypath")
but in both cases, there was an error regarding both local_dir (/mydir) and local_path (/databricks/driver/mydir/chain) that they do not exist (FileNotFound)
Actually you can see that in the first error case, it is shown /mydir (mydir directly under the root), which may not be OK.
In any case, I am still in the same place, I am not able to download the artifacts which I am intending to. ๐
โ02-21-2024 06:00 AM
This is really confusing.
I ran:
dbutils.fs.mkdirs("/databricks/driver/mydir")
dbutils.fs.ls("/databricks/driver")
local_path = client.download_artifacts(run_id, "chain", "mydir")
print("Artifacts downloaded in: {}".format(local_path))
dbutils.fs.ls("/databricks/driver/mydir")
with the result: []
What means that actually no artifacts were downloaded, or am I missing something?
โ02-22-2024 02:21 AM
Hi @Octavian1, I apologize for the confusion youโre experiencing.
Letโs break down the steps and troubleshoot the issue:
Creating the Directory:
dbutils.fs.mkdirs("/databricks/driver/mydir")
.Listing Contents of โ/databricks/driverโ:
dbutils.fs.ls("/databricks/driver")
, it showed that the directory โmydirโ exists within โ/databricks/driverโ.FileInfo(path='dbfs:/databricks/driver/mydir/', name='mydir/', size=0, modificationTime=17...)
confirms its existence.Downloading Artifacts:
client.download_artifacts(run_id, "chain", "mydir")
to download artifacts from the specified run.Listing Contents Again:
dbutils.fs.ls("/databricks/driver/mydir")
again, it returned an empty result.Possible Issue:
<run-id>
.Double-Check Artifact Path:
Copying Artifacts to DBFS:
dbutils.fs.cp("file:/databricks/driver/mydir/chain", "dbfs:/mnt/mypath")
Replace โ/mnt/mypathโ with the actual DBFS path where you want to store the artifacts.Verify in DBFS:
/dbfs/mnt/mypath
) to verify that the artifacts are accessible in DBFS.Remember that the โmydirโ directory is a temporary workspace location within Databricks. By copying the artifacts to DBFS, youโll make them available for further use. If you encounter any issues during this process, please let me know, and weโll continue troubleshooting! ๐๐๐ฆ
For more information, you can refer to the Databricks documentation on [interacting with workspace f...1.
โ02-22-2024 04:32 AM
Hi @Kaniz_Fatma ,
Indeed the artifacts are in
"/dbfs/databricks/mlflow-tracking/<id>/<run_id>/artifacts/chain"
and I am able to navigate in the UI at the URL mentioned above, where I can see the artifacts.
So I am not sure why the download apparently succeeds (as seen in the method response), but the final result is not the expected one.
All of the rest you wrote is what I had done.
Now I am thinking of an alternative, is it possible to do the same not from the DB notebook, but from a local script?
I am asking because I am not sure what settings I need in place to be able to run
client.download_artifacts(run_id, "chain", "mydir")
As such, I get an error message of not recognizing run_id.
Or can the same operation (download_artifacts) be done by calling a REST API? If yes, which would it be?
Or using the databricks cli?
Thank you!
โ02-26-2024 06:03 AM
OK, eventually I found a solution. I write it below, whether somebody will need it. Basically, if in the download_artifacts method the local directory is an existing and accessible one in the DBFS, the process will work as expected.
import os
# Consider you have the artifacts in "/dbfs/databricks/mlflow-tracking/<id>/<run_id>/artifacts/chain"
client = MlflowClient()
local_dir = "/dbfs/FileStore/mydir1" # existing and accessible DBFS folder
run_id = "<run_id>"
local_path = client.download_artifacts(run_id, "chain", local_dir)
print("Artifacts downloaded in: {}".format(local_path))
# expected output print message: Artifacts downloaded in: /dbfs/FileStore/mydir1/chain
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group