cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Can't load model from UC due to DBFS issue

migq2
New Contributor III

I want to load a model I have registered in Unity Catalog using a Shared cluster, but it seems to be trying to use dbfs under the hood and it gives me an error.

I am using DBR 13.3 LTS and mlflow-skinny[databricks]==2.14.3

My code 

import mlflow

mlflow.set_registry_uri("databricks-uc")

mlflow.spark.load_model("models:/my_uc_catalog.my_schema.my_model@my_alias")

Throws:

OSError: [Errno 95] Operation not supported: '/dbfs/tmp'

 If I run it on a Single User cluster it works fine, but I want it to work on Shared clusters

2 REPLIES 2

szymon_dybczak
Contributor III

Hi @migq2 ,

Look at below snippets from documentation. It works on single user cluster, because that mode has full access to DBFS. 

You can try to grant ANY FILE permission to make it work on shared cluster.

How does DBFS work in single user access mode?

Clusters configured with single user access mode have full access to DBFS, including all files in the DBFS root and mounted data.

How does DBFS work in shared access mode?

Shared access mode combines Unity Catalog data governance with Azure Databricks legacy table ACLs. Access to data in the hive_metastore is only available to users that have permissions explicitly granted.

To interact with files directly using DBFS, you must have ANY FILE permissions granted. Because ANY FILE allows users to bypass legacy tables ACLs in the hive_metastore and access all data managed by DBFS, Databricks recommends caution when granting this privilege.

 

You can also try to hack this library a little bit. Take a look on stackoverflow thread. Maybe instead passing path to dbfs, you can try pass path to UC Volume. I don't know if it'll work, but it's worth a try.

from mlflow.utils.databricks_utils import _get_dbutils

 

def fake_tmp():

  return 'path_to_Volume...' # something writable

 

_get_dbutils().entry_point.getReplLocalTempDir = fake_tmp

 

https://stackoverflow.com/questions/77579396/databricks-11-mlflow-error-permission-denied-in-create-...

jacovangelder
Honored Contributor

Have you tried to tell MLFlow to look for models in UC?

 

mlflow.set_registry_uri("databricks-uc")

 

Edit: never mind I see you have already. It shouldn't do/search for anything on DBFS anymore when setting this option so it is a bit strange. Shared cluster security model doesn't allow interaction with DBFS and even ANY FILE permissions shouldn't be needed. Wouldn't recommend it either. 

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group