Can't load model from UC due to DBFS issue
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-19-2024 11:14 AM
I want to load a model I have registered in Unity Catalog using a Shared cluster, but it seems to be trying to use dbfs under the hood and it gives me an error.
I am using DBR 13.3 LTS and mlflow-skinny[databricks]==2.14.3
My code
import mlflow
mlflow.set_registry_uri("databricks-uc")
mlflow.spark.load_model("models:/my_uc_catalog.my_schema.my_model@my_alias")
Throws:
OSError: [Errno 95] Operation not supported: '/dbfs/tmp'
If I run it on a Single User cluster it works fine, but I want it to work on Shared clusters
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-19-2024 08:14 PM
Hi @migq2 ,
Look at below snippets from documentation. It works on single user cluster, because that mode has full access to DBFS.
You can try to grant ANY FILE permission to make it work on shared cluster.
How does DBFS work in single user access mode?
Clusters configured with single user access mode have full access to DBFS, including all files in the DBFS root and mounted data.
How does DBFS work in shared access mode?
Shared access mode combines Unity Catalog data governance with Azure Databricks legacy table ACLs. Access to data in the hive_metastore is only available to users that have permissions explicitly granted.
To interact with files directly using DBFS, you must have ANY FILE permissions granted. Because ANY FILE allows users to bypass legacy tables ACLs in the hive_metastore and access all data managed by DBFS, Databricks recommends caution when granting this privilege.
You can also try to hack this library a little bit. Take a look on stackoverflow thread. Maybe instead passing path to dbfs, you can try pass path to UC Volume. I don't know if it'll work, but it's worth a try.
from mlflow.utils.databricks_utils import _get_dbutils
def fake_tmp():
return 'path_to_Volume...' # something writable
_get_dbutils().entry_point.getReplLocalTempDir = fake_tmp
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-20-2024 02:10 AM - edited 07-20-2024 02:19 AM
Have you tried to tell MLFlow to look for models in UC?
mlflow.set_registry_uri("databricks-uc")
Edit: never mind I see you have already. It shouldn't do/search for anything on DBFS anymore when setting this option so it is a bit strange. Shared cluster security model doesn't allow interaction with DBFS and even ANY FILE permissions shouldn't be needed. Wouldn't recommend it either.

