Databricks Community

Octavian1 · ‎02-21-2024

I am trying to find a way to locally download the model artifacts that build a chatbot chain registered with MLflow in Databricks, so that I can preserve the whole structure (chain -> model -> steps -> yaml & pkl files).

There is a mention in a contributed article, but it is not clear what `local_dir` really represents (inside dbfs, in the volume, on the local computer?) and what format it is supposed to have.

Maybe somebody knows the answer 🙂

Thx

Octavian1 · ‎02-26-2024

OK, eventually I found a solution. I write it below, whether somebody will need it. Basically, if in the download_artifacts method the local directory is an existing and accessible one in the DBFS, the process will work as expected.

import os 
# Consider you have the artifacts in "/dbfs/databricks/mlflow-tracking/<id>/<run_id>/artifacts/chain"

client = MlflowClient()
local_dir = "/dbfs/FileStore/mydir1" # existing and accessible DBFS folder
run_id = "<run_id>"
local_path = client.download_artifacts(run_id, "chain", local_dir)
print("Artifacts downloaded in: {}".format(local_path))

# expected output print message: Artifacts downloaded in: /dbfs/FileStore/mydir1/chain

View solution in original post

Octavian1 · ‎02-21-2024

Hi @Retired_mod and thank you for your answer.

So I have run this piece of code from a Databricks notebook within my workspace.

Literally:

import os 
# Consider I have the artifacts in "/dbfs/databricks/mlflow-tracking/<id>/<run_id>/artifacts/chain"

client = MlflowClient()
local_dir = "mydir"
os.makedirs(local_dir, exist_ok=True)
run_id = "<run_id>"
local_path = client.download_artifacts(run_id, "chain", local_dir)
print("Artifacts downloaded in: {}".format(local_dir))

It runs OK, with the expected output:

Artifacts downloaded in: mydir

The question is, where was mydir created? I cannot find it anywhere (workspace, dbfs, volume...)

Thank you!

Octavian1 · ‎02-21-2024

Hi @Retired_mod and thanks again.

So in my example the artifacts have been downloaded to the local_path, which is /databricks/driver/mydir/chain
From your second explanation at point 1., it turns out that also this directory is not directly visible/accessible (The directory “mydir” exists within the Databricks workspace, but it’s not visible in your local filesystem or DBFS.)

It seems then that the only way to get them is applying paragraph 4., so I proceeded with:

dbutils.fs.mv(local_dir, "/dbfs/mnt/mypath")

and also tried

dbutils.fs.mv(local_path, "/dbfs/mnt/mypath")

but in both cases, there was an error regarding both local_dir (/mydir) and local_path (/databricks/driver/mydir/chain) that they do not exist (FileNotFound)

Actually you can see that in the first error case, it is shown /mydir (mydir directly under the root), which may not be OK.

In any case, I am still in the same place, I am not able to download the artifacts which I am intending to. 🙃

Octavian1 · ‎02-21-2024

This is really confusing.

I ran:

dbutils.fs.mkdirs("/databricks/driver/mydir")

which gave me the response: True

To check it exists, I ran then:

dbutils.fs.ls("/databricks/driver")

with the response:

[FileInfo(path='dbfs:/databricks/driver/mydir/', name='mydir/', size=0, modificationTime=17...)]

then I executed:

local_path = client.download_artifacts(run_id, "chain", "mydir")
print("Artifacts downloaded in: {}".format(local_path))

with the response:

Artifacts downloaded in: /databricks/driver/mydir/chain

Eventually I ran:

dbutils.fs.ls("/databricks/driver/mydir")

with the result: []

What means that actually no artifacts were downloaded, or am I missing something?

Octavian1 · ‎02-22-2024

Hi @Retired_mod ,

Indeed the artifacts are in

"/dbfs/databricks/mlflow-tracking/<id>/<run_id>/artifacts/chain"

and I am able to navigate in the UI at the URL mentioned above, where I can see the artifacts.

So I am not sure why the download apparently succeeds (as seen in the method response), but the final result is not the expected one.

All of the rest you wrote is what I had done.

Now I am thinking of an alternative, is it possible to do the same not from the DB notebook, but from a local script?
I am asking because I am not sure what settings I need in place to be able to run

client.download_artifacts(run_id, "chain", "mydir")

As such, I get an error message of not recognizing run_id.

Or can the same operation (download_artifacts) be done by calling a REST API? If yes, which would it be?

Or using the databricks cli?

Thank you!

Octavian1 · ‎02-26-2024

OK, eventually I found a solution. I write it below, whether somebody will need it. Basically, if in the download_artifacts method the local directory is an existing and accessible one in the DBFS, the process will work as expected.

import os 
# Consider you have the artifacts in "/dbfs/databricks/mlflow-tracking/<id>/<run_id>/artifacts/chain"

client = MlflowClient()
local_dir = "/dbfs/FileStore/mydir1" # existing and accessible DBFS folder
run_id = "<run_id>"
local_path = client.download_artifacts(run_id, "chain", local_dir)
print("Artifacts downloaded in: {}".format(local_path))

# expected output print message: Artifacts downloaded in: /dbfs/FileStore/mydir1/chain

Databricks Community

Download model artifacts from MLflow

Join Us as a Local Community Builder!

Solution Accelerator Series | #4 - Toxicity Detection for Gaming

Databricks Specialist Sessions

🚀 Weekly Delta (24-30 September): A Look Back at This Week’s Top Community Highlights!

Announcing Data Intelligence for Cybersecurity

🌟 Community Sparks of the Week | September 19 – 25 🌟