cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Download model artifacts from MLflow

Octavian1
Contributor

I am trying to find a way to locally download the model artifacts that build a chatbot chain registered with MLflow in Databricks, so that I can preserve the whole structure (chain -> model -> steps -> yaml & pkl files).

Octavian1_0-1708506098526.png

There is a mention in a contributed article, but it is not clear what `local_dir` really represents (inside dbfs, in the volume, on the local computer?) and what format it is supposed to have.

Maybe somebody knows the answer ๐Ÿ™‚ 

Thx

1 ACCEPTED SOLUTION

Accepted Solutions

Octavian1
Contributor

 

 

OK, eventually I found a solution. I write it below, whether somebody will need it. Basically, if in the download_artifacts method the local directory is an existing and accessible one in the DBFS, the process will work as expected.

import os 
# Consider you have the artifacts in "/dbfs/databricks/mlflow-tracking/<id>/<run_id>/artifacts/chain"

client = MlflowClient()
local_dir = "/dbfs/FileStore/mydir1" # existing and accessible DBFS folder
run_id = "<run_id>"
local_path = client.download_artifacts(run_id, "chain", local_dir)
print("Artifacts downloaded in: {}".format(local_path))

# expected output print message: Artifacts downloaded in: /dbfs/FileStore/mydir1/chain

View solution in original post

5 REPLIES 5

Hi @Retired_mod and thank you for your answer.

So I have run this piece of code from a Databricks notebook within my workspace.

Literally:

import os 
# Consider I have the artifacts in "/dbfs/databricks/mlflow-tracking/<id>/<run_id>/artifacts/chain"

client = MlflowClient()
local_dir = "mydir"
os.makedirs(local_dir, exist_ok=True)
run_id = "<run_id>"
local_path = client.download_artifacts(run_id, "chain", local_dir)
print("Artifacts downloaded in: {}".format(local_dir))

 It runs OK, with the expected output:

Artifacts downloaded in: mydir

The question is, where was mydir created? I cannot find it anywhere (workspace, dbfs, volume...)

Thank you!

Hi @Retired_mod and thanks again.

So in my example the artifacts have been downloaded to the local_path, which is /databricks/driver/mydir/chain
From your second explanation at point 1., it turns out that also this directory is not directly visible/accessible (The directory โ€œmydirโ€ exists within the Databricks workspace, but itโ€™s not visible in your local filesystem or DBFS.)

It seems then that the only way to get them is applying paragraph 4., so I proceeded with:

dbutils.fs.mv(local_dir, "/dbfs/mnt/mypath")

and also tried

dbutils.fs.mv(local_path, "/dbfs/mnt/mypath")

but in both cases, there was an error regarding both local_dir (/mydir) and local_path (/databricks/driver/mydir/chain) that they do not exist (FileNotFound)

Actually you can see that in the first error case, it is shown /mydir (mydir directly under the root), which may not be OK.

In any case, I am still in the same place, I am not able to download the artifacts which I am intending to. ๐Ÿ™ƒ

This is really confusing.

I ran:

dbutils.fs.mkdirs("/databricks/driver/mydir")
which gave me the response: True
To check it exists, I ran then:
dbutils.fs.ls("/databricks/driver")
with the response:
[FileInfo(path='dbfs:/databricks/driver/mydir/', name='mydir/', size=0, modificationTime=17...)]
 
then I executed:

 

local_path = client.download_artifacts(run_id, "chain", "mydir")
print("Artifacts downloaded in: {}".format(local_path))

 

with the response:
Artifacts downloaded in: /databricks/driver/mydir/chain
 
Eventually I ran:
dbutils.fs.ls("/databricks/driver/mydir")

with the result: []

What means that actually no artifacts were downloaded, or am I missing something?

Hi @Retired_mod ,

Indeed the artifacts are in

"/dbfs/databricks/mlflow-tracking/<id>/<run_id>/artifacts/chain"

 and I am able to navigate in the UI at the URL mentioned above, where I can see the artifacts.

So I am not sure why the download apparently succeeds (as seen in the method response), but the final result is not the expected one.

All of the rest you wrote is what I had done.

Now I am thinking of an alternative, is it possible to do the same not from the DB notebook, but from a local script?
I am asking because I am not sure what settings I need in place to be able to run

client.download_artifacts(run_id, "chain", "mydir")

As such, I get an error message of not recognizing run_id.

Or can the same operation (download_artifacts) be done by calling a REST API? If yes, which would it be?

Or using the databricks cli?

Thank you!

Octavian1
Contributor

 

 

OK, eventually I found a solution. I write it below, whether somebody will need it. Basically, if in the download_artifacts method the local directory is an existing and accessible one in the DBFS, the process will work as expected.

import os 
# Consider you have the artifacts in "/dbfs/databricks/mlflow-tracking/<id>/<run_id>/artifacts/chain"

client = MlflowClient()
local_dir = "/dbfs/FileStore/mydir1" # existing and accessible DBFS folder
run_id = "<run_id>"
local_path = client.download_artifacts(run_id, "chain", local_dir)
print("Artifacts downloaded in: {}".format(local_path))

# expected output print message: Artifacts downloaded in: /dbfs/FileStore/mydir1/chain

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group