cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Pushing SparkNLP Model on Mlflow

Youssef1985
New Contributor

Hello Everyone,

I am trying to load a SparkNLP (link for more details about the model if required) from Mlflow Registry.

To this end, I have followed one tutorial and implemented below codes:

import mlflow.pyfunc
 
class LangDetectionModel(mlflow.pyfunc.PythonModel):
    def __init__(self):
      super().__init__()
      from sparknlp.pretrained import PretrainedPipeline
      from sparknlp.pretrained import PipelineModel 
      # embed the sparknlp model 
      self._model  = PipelineModel.load("/mnt/sparknlp_models/detect_language_375/")
    def predict(self, eval_data_lang_detect):
    # Apply the transform function for lang detetction
      list_columns = eval_data_lang_detect.columns 
      model_output =self._model.transform(eval_data_lang_detect).select(list_columns+ [F.col("language.result").getItem(0)]).withColumnRenamed('language.result[0]','sparknlp_column')
      return model_output
model_path = "my-langdetect-model"
reg_model_name = "NlpieLangDetection"
sparknlp_model = LangDetectionModel()
# Log MLflow entities and save the model
mlflow.set_tracking_uri("sqlite:///mlruns.db")
 
# Save the conda environment for this model.
conda_env = {
    'channels': ['defaults', 'conda-forge'],
    'dependencies': [
        'python={}'.format(PYTHON_VERSION),
        'pip'],
    'pip': [
        'mlflow',
        'cloudpickle=={}'.format(cloudpickle.__version__),
        'NlpieLangDetection==0.0.1'
    ],
    'name': 'mlflow-env'
}
# Save the model
mlflow.set_experiment('/Users/Youssef.Meguebli@sanofi.com/Language_Detection_Translation/LangDetectionTest')
with mlflow.start_run(run_name="Nlpie Language Detection") as run:
    model_path = f"{model_path}-{run.info.run_uuid}"
    mlflow.log_param("algorithm", "SparNLPLangDetection")
    mlflow.pyfunc.save_model(path=model_path, python_model=sparknlp_model, conda_env=conda_env)

I am getting an error on last piece of code where I am trying to save the model on Mlflow registry.

Below the error get I am getting:

TypeError: cannot pickle '_thread.RLock' object
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<command-2121909764500367> in <module>
      4     model_path = f"{model_path}-{run.info.run_uuid}"
      5     mlflow.log_param("algorithm", "SparNLPLangDetection")
----> 6     mlflow.pyfunc.save_model(path=model_path, python_model=sparknlp_model, conda_env=conda_env)
 
/databricks/python/lib/python3.8/site-packages/mlflow/pyfunc/__init__.py in save_model(path, loader_module, data_path, code_path, conda_env, mlflow_model, python_model, artifacts, signature, input_example, pip_requirements, extra_pip_requirements, **kwargs)
   1467         )
   1468     elif second_argument_set_specified:
-> 1469         return mlflow.pyfunc.model._save_model_with_class_artifacts_params(
   1470             path=path,
   1471             python_model=python_model,
 
/databricks/python/lib/python3.8/site-packages/mlflow/pyfunc/model.py in _save_model_with_class_artifacts_params(path, python_model, artifacts, conda_env, code_paths, mlflow_model, pip_requirements, extra_pip_requirements)
    162         saved_python_model_subpath = "python_model.pkl"
    163         with open(os.path.join(path, saved_python_model_subpath), "wb") as out:
--> 164             cloudpickle.dump(python_model, out)
    165         custom_model_config_kwargs[CONFIG_KEY_PYTHON_MODEL] = saved_python_model_subpath
    166     else:
 
/databricks/python/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py in dump(obj, file, protocol, buffer_callback)
     53         compatibility with older versions of Python.
     54         """
---> 55         CloudPickler(
     56             file, protocol=protocol, buffer_callback=buffer_callback
     57         ).dump(obj)
 
/databricks/python/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py in dump(self, obj)
    631     def dump(self, obj):
    632         try:
--> 633             return Pickler.dump(self, obj)
    634         except RuntimeError as e:
    635             if "recursion" in e.args[0]:
 
TypeError: cannot pickle '_thread.RLock' object

Please let me know if you need any further details.

Many Thanks in advance for your support.

2 REPLIES 2

Kari
New Contributor II

Hi.

The problem might be with pickling a language model.

Have you tried to use mlflow.spark.log_model to save the model? Spark ML models cannot be serialized as pickle files. They are serialized in a language-neutral hierarchical format that can be read by both Python and Scala as in the "sparkml" directory below.

spark-model
+-sparkml/
| +-stages/
| | +-1_DecisionTreeRegressor_6aae1e6c3fed/
| | | +-data/
| | | | +-part-00000-a4b9cb99-abd2-40c3-90d2-a46b44926263-c000.snappy.parquet
| | | | +-.part-00000-a4b9cb99-abd2-40c3-90d2-a46b44926263-c000.snappy.parquet.crc
| | | | +-._SUCCESS.crc
| | | |
| | | +-metadata/
| | |   +-part-00000
| | |   +-.part-00000.crc
| | |   +-._SUCCESS.crc
| | |  
| | +-0_VectorAssembler_ce8bcea8c5b3/
| |   +-metadata/
| |     +-part-00000
| |     +-.part-00000.crc
| |     +-._SUCCESS.crc
| |    
| +-metadata/
|   +-part-00000
|   +-.part-00000.crc
|   +-._SUCCESS.crc
|  
+-requirements.txt
+-python_env.yaml
+-conda.yaml
+-MLmodel

Another resource on Models an deployment is this post on Medium "Effortless models deployment with Mlflow — Packing a NLP product review classifier from HuggingFace" (https://santiagof.medium.com/effortless-models-deployment-with-mlflow-packing-a-nlp-product-review-classifier-from-huggingface-13be2650333)

tala
New Contributor II

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group