02-21-2024 01:03 AM
I am trying to find a way to locally download the model artifacts that build a chatbot chain registered with MLflow in Databricks, so that I can preserve the whole structure (chain -> model -> steps -> yaml & pkl files).
There is a mention in a contributed article, but it is not clear what `local_dir` really represents (inside dbfs, in the volume, on the local computer?) and what format it is supposed to have.
Maybe somebody knows the answer 🙂
Thx
02-26-2024 06:03 AM
OK, eventually I found a solution. I write it below, whether somebody will need it. Basically, if in the download_artifacts method the local directory is an existing and accessible one in the DBFS, the process will work as expected.
import os
# Consider you have the artifacts in "/dbfs/databricks/mlflow-tracking/<id>/<run_id>/artifacts/chain"
client = MlflowClient()
local_dir = "/dbfs/FileStore/mydir1" # existing and accessible DBFS folder
run_id = "<run_id>"
local_path = client.download_artifacts(run_id, "chain", local_dir)
print("Artifacts downloaded in: {}".format(local_path))
# expected output print message: Artifacts downloaded in: /dbfs/FileStore/mydir1/chain
02-21-2024 01:26 AM
Hi @Octavian1, When working with MLflow in Databricks, you can download model artifacts to your local storage using the client.download_artifacts
method.
Let me explain how it works:
By default, MLflow saves artifacts to an artifact store URI during an experiment. The artifact store URI follows a structure like /dbfs/databricks/mlflow-tracking/<experiment-id>/<run-id>/artifacts/
. However, this artifact store is managed by MLflow, and you cannot directly download artifacts from it.
To download artifacts, you must use the client.download_artifacts
method. This method allows you to copy artifacts from the artifact store to another storage location of your choice. You specify the local directory (local_dir
) where you want to store the downloaded artifacts.
Here’s an example code snippet in Python that demonstrates how to download MLflow artifacts from a specific run and store them locally:
import mlflow
import os
from mlflow.tracking import MlflowClient
# Initialize MLflow client
client = MlflowClient()
# Specify the local directory where you want to store artifacts
local_dir = "<local-path-to-store-artifacts>"
# Create the local directory if it doesn't exist
if not os.path.exists(local_dir):
os.mkdir(local_dir)
# Assume you have logged an artifact named "features.txt" during an MLflow run
features = "rooms, zipcode, median_price, school_rating, transport"
with open("features.txt", 'w') as f:
f.write(features)
# Create a sample MLflow run and log the artifact "features.txt"
with mlflow.start_run() as run:
mlflow.log_artifact("features.txt", artifact_path="features")
# Download the artifact to local storage
local_path = client.download_artifacts(<run-id>, "features", local_dir)
print(f"Artifacts downloaded in: {local_dir}")
After downloading the artifacts to your local storage, you can further copy or move them to an external filesystem or a mount point using standard tools. For example:
%scala dbutils.fs.cp(local_dir, "<filesystem://path-to-store-artifacts>")
.shutil.move(local_dir, "/dbfs/mnt/<path-to-store-artifacts>")
.Remember to replace <local-path-to-store-artifacts>
it with your desired local directory and <run-id>
with the actual run ID of your specified MLflow run. This way, you can preserve the entire structure of your chatbot chain, including models, steps, and associated files. 🤖📦
For more details, you can refer to the official Databricks documentation on downloading MLflow artifacts. If you have any further questions, feel free to ask! 😊
02-21-2024 01:51 AM
Hi @Kaniz and thank you for your answer.
So I have run this piece of code from a Databricks notebook within my workspace.
Literally:
import os
# Consider I have the artifacts in "/dbfs/databricks/mlflow-tracking/<id>/<run_id>/artifacts/chain"
client = MlflowClient()
local_dir = "mydir"
os.makedirs(local_dir, exist_ok=True)
run_id = "<run_id>"
local_path = client.download_artifacts(run_id, "chain", local_dir)
print("Artifacts downloaded in: {}".format(local_dir))
It runs OK, with the expected output:
Artifacts downloaded in: mydir
The question is, where was mydir created? I cannot find it anywhere (workspace, dbfs, volume...)
Thank you!
02-21-2024 01:56 AM
Hi @Octavian1, The directory “mydir” that you specified in your code is created within the Databricks workspace. However, it’s important to understand that this directory is not directly accessible from your local machine or the DBFS (Databricks File System).
Let me explain further:
Workspace Location:
os.makedirs(local_dir, exist_ok=True)
in your Databricks notebook, it is created within the Databricks workspace.Accessing Artifacts:
client.download_artifacts
are stored in the Databricks artifact store, which is managed by MLflow."chain"
) corresponds to the artifact path within the run identified by <run_id>
.Viewing Artifacts:
Copying or Moving Artifacts:
dbutils.fs.cp(local_dir, "file:/mnt/<mount-point>/<path-to-store-artifacts>")
dbutils.fs.mv(local_dir, "/dbfs/mnt/<path-to-store-artifacts>")
Remember that the “mydir” directory is a temporary workspace location within Databricks, and you’ll need to take additional steps to make the artifacts accessible in other environments. If you have specific requirements for where you want to store the artifacts, consider using an appropriate mount point or external storage location. 📁🔍🚀
For more details, you can refer to the Databricks documentation on interacting with workspace files1.
02-21-2024 02:20 AM
Hi @Kaniz and thanks again.
So in my example the artifacts have been downloaded to the local_path, which is /databricks/driver/mydir/chain
From your second explanation at point 1., it turns out that also this directory is not directly visible/accessible (The directory “mydir” exists within the Databricks workspace, but it’s not visible in your local filesystem or DBFS.)
It seems then that the only way to get them is applying paragraph 4., so I proceeded with:
dbutils.fs.mv(local_dir, "/dbfs/mnt/mypath")
and also tried
dbutils.fs.mv(local_path, "/dbfs/mnt/mypath")
but in both cases, there was an error regarding both local_dir (/mydir) and local_path (/databricks/driver/mydir/chain) that they do not exist (FileNotFound)
Actually you can see that in the first error case, it is shown /mydir (mydir directly under the root), which may not be OK.
In any case, I am still in the same place, I am not able to download the artifacts which I am intending to. 🙃
02-21-2024 06:00 AM
This is really confusing.
I ran:
dbutils.fs.mkdirs("/databricks/driver/mydir")
dbutils.fs.ls("/databricks/driver")
local_path = client.download_artifacts(run_id, "chain", "mydir")
print("Artifacts downloaded in: {}".format(local_path))
dbutils.fs.ls("/databricks/driver/mydir")
with the result: []
What means that actually no artifacts were downloaded, or am I missing something?
02-22-2024 02:21 AM
Hi @Octavian1, I apologize for the confusion you’re experiencing.
Let’s break down the steps and troubleshoot the issue:
Creating the Directory:
dbutils.fs.mkdirs("/databricks/driver/mydir")
.Listing Contents of “/databricks/driver”:
dbutils.fs.ls("/databricks/driver")
, it showed that the directory “mydir” exists within “/databricks/driver”.FileInfo(path='dbfs:/databricks/driver/mydir/', name='mydir/', size=0, modificationTime=17...)
confirms its existence.Downloading Artifacts:
client.download_artifacts(run_id, "chain", "mydir")
to download artifacts from the specified run.Listing Contents Again:
dbutils.fs.ls("/databricks/driver/mydir")
again, it returned an empty result.Possible Issue:
<run-id>
.Double-Check Artifact Path:
Copying Artifacts to DBFS:
dbutils.fs.cp("file:/databricks/driver/mydir/chain", "dbfs:/mnt/mypath")
Replace “/mnt/mypath” with the actual DBFS path where you want to store the artifacts.Verify in DBFS:
/dbfs/mnt/mypath
) to verify that the artifacts are accessible in DBFS.Remember that the “mydir” directory is a temporary workspace location within Databricks. By copying the artifacts to DBFS, you’ll make them available for further use. If you encounter any issues during this process, please let me know, and we’ll continue troubleshooting! 🚀🔍📦
For more information, you can refer to the Databricks documentation on [interacting with workspace f...1.
02-22-2024 04:32 AM
Hi @Kaniz ,
Indeed the artifacts are in
"/dbfs/databricks/mlflow-tracking/<id>/<run_id>/artifacts/chain"
and I am able to navigate in the UI at the URL mentioned above, where I can see the artifacts.
So I am not sure why the download apparently succeeds (as seen in the method response), but the final result is not the expected one.
All of the rest you wrote is what I had done.
Now I am thinking of an alternative, is it possible to do the same not from the DB notebook, but from a local script?
I am asking because I am not sure what settings I need in place to be able to run
client.download_artifacts(run_id, "chain", "mydir")
As such, I get an error message of not recognizing run_id.
Or can the same operation (download_artifacts) be done by calling a REST API? If yes, which would it be?
Or using the databricks cli?
Thank you!
02-26-2024 06:03 AM
OK, eventually I found a solution. I write it below, whether somebody will need it. Basically, if in the download_artifacts method the local directory is an existing and accessible one in the DBFS, the process will work as expected.
import os
# Consider you have the artifacts in "/dbfs/databricks/mlflow-tracking/<id>/<run_id>/artifacts/chain"
client = MlflowClient()
local_dir = "/dbfs/FileStore/mydir1" # existing and accessible DBFS folder
run_id = "<run_id>"
local_path = client.download_artifacts(run_id, "chain", local_dir)
print("Artifacts downloaded in: {}".format(local_path))
# expected output print message: Artifacts downloaded in: /dbfs/FileStore/mydir1/chain
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.