02-21-2024 01:03 AM
I am trying to find a way to locally download the model artifacts that build a chatbot chain registered with MLflow in Databricks, so that I can preserve the whole structure (chain -> model -> steps -> yaml & pkl files).
There is a mention in a contributed article, but it is not clear what `local_dir` really represents (inside dbfs, in the volume, on the local computer?) and what format it is supposed to have.
Maybe somebody knows the answer 🙂
Thx
02-26-2024 06:03 AM
OK, eventually I found a solution. I write it below, whether somebody will need it. Basically, if in the download_artifacts method the local directory is an existing and accessible one in the DBFS, the process will work as expected.
import os
# Consider you have the artifacts in "/dbfs/databricks/mlflow-tracking/<id>/<run_id>/artifacts/chain"
client = MlflowClient()
local_dir = "/dbfs/FileStore/mydir1" # existing and accessible DBFS folder
run_id = "<run_id>"
local_path = client.download_artifacts(run_id, "chain", local_dir)
print("Artifacts downloaded in: {}".format(local_path))
# expected output print message: Artifacts downloaded in: /dbfs/FileStore/mydir1/chain
02-21-2024 01:51 AM
Hi @Retired_mod and thank you for your answer.
So I have run this piece of code from a Databricks notebook within my workspace.
Literally:
import os
# Consider I have the artifacts in "/dbfs/databricks/mlflow-tracking/<id>/<run_id>/artifacts/chain"
client = MlflowClient()
local_dir = "mydir"
os.makedirs(local_dir, exist_ok=True)
run_id = "<run_id>"
local_path = client.download_artifacts(run_id, "chain", local_dir)
print("Artifacts downloaded in: {}".format(local_dir))
It runs OK, with the expected output:
Artifacts downloaded in: mydir
The question is, where was mydir created? I cannot find it anywhere (workspace, dbfs, volume...)
Thank you!
02-21-2024 02:20 AM
Hi @Retired_mod and thanks again.
So in my example the artifacts have been downloaded to the local_path, which is /databricks/driver/mydir/chain
From your second explanation at point 1., it turns out that also this directory is not directly visible/accessible (The directory “mydir” exists within the Databricks workspace, but it’s not visible in your local filesystem or DBFS.)
It seems then that the only way to get them is applying paragraph 4., so I proceeded with:
dbutils.fs.mv(local_dir, "/dbfs/mnt/mypath")
and also tried
dbutils.fs.mv(local_path, "/dbfs/mnt/mypath")
but in both cases, there was an error regarding both local_dir (/mydir) and local_path (/databricks/driver/mydir/chain) that they do not exist (FileNotFound)
Actually you can see that in the first error case, it is shown /mydir (mydir directly under the root), which may not be OK.
In any case, I am still in the same place, I am not able to download the artifacts which I am intending to. 🙃
02-21-2024 06:00 AM
This is really confusing.
I ran:
dbutils.fs.mkdirs("/databricks/driver/mydir")
dbutils.fs.ls("/databricks/driver")
local_path = client.download_artifacts(run_id, "chain", "mydir")
print("Artifacts downloaded in: {}".format(local_path))
dbutils.fs.ls("/databricks/driver/mydir")
with the result: []
What means that actually no artifacts were downloaded, or am I missing something?
02-22-2024 04:32 AM
Hi @Retired_mod ,
Indeed the artifacts are in
"/dbfs/databricks/mlflow-tracking/<id>/<run_id>/artifacts/chain"
and I am able to navigate in the UI at the URL mentioned above, where I can see the artifacts.
So I am not sure why the download apparently succeeds (as seen in the method response), but the final result is not the expected one.
All of the rest you wrote is what I had done.
Now I am thinking of an alternative, is it possible to do the same not from the DB notebook, but from a local script?
I am asking because I am not sure what settings I need in place to be able to run
client.download_artifacts(run_id, "chain", "mydir")
As such, I get an error message of not recognizing run_id.
Or can the same operation (download_artifacts) be done by calling a REST API? If yes, which would it be?
Or using the databricks cli?
Thank you!
02-26-2024 06:03 AM
OK, eventually I found a solution. I write it below, whether somebody will need it. Basically, if in the download_artifacts method the local directory is an existing and accessible one in the DBFS, the process will work as expected.
import os
# Consider you have the artifacts in "/dbfs/databricks/mlflow-tracking/<id>/<run_id>/artifacts/chain"
client = MlflowClient()
local_dir = "/dbfs/FileStore/mydir1" # existing and accessible DBFS folder
run_id = "<run_id>"
local_path = client.download_artifacts(run_id, "chain", local_dir)
print("Artifacts downloaded in: {}".format(local_path))
# expected output print message: Artifacts downloaded in: /dbfs/FileStore/mydir1/chain
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group