cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Error: batch scoring with mlflow.keras flavor model

Miki
New Contributor II

I am logging a trained keras model using the following: 

 fe.log_model(
model=model,
artifact_path="wine_quality_prediction",
flavor= mlflow.keras,
training_set=training_set,
registered_model_name=model_name
)

And when I call the following:

predictions_df = fe.score_batch(model_uri=f"models:/{model_name}/{latest_model_version}", df=batch_input_df)
display(predictions_df)

I get the following error:

OSError: [Errno 30] Read-only file system: '/local_disk0/.ephemeral_nfs/repl_tmp_data/ReplId-62b1a-a1cdf-baa5b-1/mlflow/models/tmpajr94lkz/raw_model/data/model.keras

I get the same 

I am essentially just trying to adapt this example to use a keras model instead of a Random Forest. The code runs fine with a Random Forest, and it runs fine if I just use the native mlflow log_model(), load_model() and predict() functions. However, I do get the error if I just use mlflow.pyfunc.load_model() to load the model and call model.predict(). This makes me think the bug is specific to the way in which the databricks FeatureEngineeringClient module is saving the keras model. 

I would appreciate any help with this issue.

3 REPLIES 3

Kaniz_Fatma
Community Manager
Community Manager

Hi @MikiThe OSError: [Errno 30] Read-only file system typically occurs when you attempt to write to a directory that is read-only or does not exist.

Let’s explore some possible solutions:

  1. Check the Path:

    • Ensure that the path you’ve provided for saving the Keras model is correct and points to a writable directory. Double-check the directory structure and permissions.
    • If you’re using a relative path, make sure it’s relative to the correct working directory.
  2. Absolute Path:

    • Instead of using a relative path, consider using an absolute path. Absolute paths start from the root directory and are less prone to errors related to working directories.
    • For example, use /tmp/some-file.zip instead of tmp/some-file.zip.
  3. Temporary Directory:

    • If you’re saving temporary files, consider using a temporary directory (such as /tmp on Unix-like systems) that is writable.
    • You can create a temporary directory within your code and use it for saving the model.
  4. Permissions:

    • Ensure that the user running the code has the necessary permissions to write to the specified directory.
    • If you’re running the code in a restricted environment (such as AWS Lambda), be aware of the read-only file system limitations.
  5. Keras Model Saving:

    • When saving a Keras model, make sure you’re using the appropriate method. For example:
      • Use model.save(filepath) to save the entire model.
      • Use model.save_weights(filepath) to save only the model weights.
    • Verify that the model is being saved correctly.

Miki
New Contributor II

Hi Kaniz,

Thanks for the response. Apologies if I am missing something, but since I am directly using the databricks FeatureEngineeringClient.log_model()  method, I am not given the option to specify the path to write the model to. The only parameter I am given the option to provide is the artifact path and the model name, neither of which give me enough control to implement the solutions you are suggesting. I could potentially define a custom pyfunc rather than using the existing mlflow.keras flavor and then define my own save_model() and load_model() functions. However, I am struggling to see why this error is happening only when I am using the FeatureEngineeringClient() to log and load my model, while this all works fine when I use the mlflow logging and loading (although this prevents me from leveraging the automatic feature lookups provided by the feature store). 

Am I missing something? 

Miki
New Contributor II

I figured it out based on this issue that someone posted. By switching to a single-node cluster (meaning worker node permissions are irrelevant), this code works now. 

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!