cancel
Showing results for 
Search instead for 
Did you mean: 
Data Governance
Join discussions on data governance practices, compliance, and security within the Databricks Community. Exchange strategies and insights to ensure data integrity and regulatory compliance.
cancel
Showing results for 
Search instead for 
Did you mean: 

MLlib load from UC Volume: IllegalArgumentException: Cannot access the UC Volume path...

stiaangerber
New Contributor III

I'm trying to store MLlib instances in Unity Catalog Volumes. I think volumes are a great way to keep things organized.

I can save to a volume without any issues and I can access the data using spark.read and with plain python open(). However, when I try to load a saved MLlib instance using MLReader.load, I get:

 

IllegalArgumentException: Cannot access the UC Volume path from this location.

 

See the attached demo. I'm a metastore admin and owner of the volume.

Am I missing something? Is this not supported (yet)? Can I make it work somehow?

I've tested this on DBR 13.3 and 14.2, same thing. Any help will be much appreciated.

4 REPLIES 4

Kaniz_Fatma
Community Manager
Community Manager

Hi @stiaangerber, It appears that you’re encountering an issue when loading a saved MLlib instance from a Unity Catalog (UC) Volume in Databricks. 

 

Let’s explore this further:

  • Check Workspace Permissions:
    • Ensure that your workspace permissions are correctly set. As a metastore admin and volume owner, you should have sufficient privileges but double-check.
  • DBR Version Compatibility:
    • You mentioned testing on DBR 13.3 and 14.2. Verify if there are any known issues related to UC Volumes and MLlib in these specific DBR versions.
    • Sometimes, specific features or integrations might not be fully supported in certain DBR versions.
  • Alternative Approach:
    • If loading directly from UC Volumes doesn’t work, consider copying the MLlib instance to a different location (e.g., DBFS) and then loading it.
    • You can use dbutils.fs.cp to copy the saved model from UC Volume to DBFS.

Future Enhancements:

  • UC Volumes are still in preview, and Databricks continually improves its features.
  • It’s possible that future releases will address this limitation or provide better support for MLlib instances within UC Volumes.

Remember that Databricks evolves rapidly, and sometimes certain features might not be fully matured in preview versions.


 

Thanks @Kaniz_Fatma. For now, I implemented a workaround. For others with a similar issue, this works:

def load_model(mlclass, uc_vol_path):
    try:
        tmpdir = f'/FileStore/{uuid.uuid4().hex}'
        dbutils.fs.cp(uc_vol_path, tmpdir, recurse=True)
        model = mlclass.load(tmpdir)
        return model
    finally:
        dbutils.fs.rm(tmpdir, recurse=True)

dct2 = load_model(DCT, '/Volumes/demo/default/vol/dct')

stiaangerber
New Contributor III

Actually, the above is no good for models with associated data. (You can only delete the tmpdir after your done using the model)

slimexy
New Contributor II

Just to supplement that if the ML model is saved and then loaded within the same execution, calling load() will not cause the mentioned exception. Copying the model directory from UC volume to ephemeral storage attached to the driver node is also a work around (without the need to delete the tmpdir in DBFS after loading the model), but works in single node mode only.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group