Databricks Community

Shreyash · ‎04-16-2024

I am trying to serve a pyspark model using an endpoint. I was able to load and register the model normally. I could also load that model and perform inference but while serving the model, I am getting the following error:

[94fffqts54] ERROR StatusLogger Reconfiguration failed: No configuration found for 'Default' at 'null' in 'null'
[94fffqts54] ERROR StatusLogger Reconfiguration failed: No configuration found for '5ffd2b27' at 'null' in 'null'
[94fffqts54] ERROR StatusLogger Reconfiguration failed: No configuration found for 'Default' at 'null' in 'null'
[94fffqts54] An error occurred while loading the model. An error occurred while calling o63.load.
[94fffqts54] : java.lang.ClassNotFoundException: com.johnsnowlabs.nlp.DocumentAssembler
[94fffqts54] at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)

My conf file looks like this:

conda_env_conf = {
    "channels": ["defaults"],
    "dependencies": [
        "python=3.9.5",
        "pip",
        {
            "pip": [
                "spark-nlp==5.3.1",
                "pyspark==3.3.2",
                "mlflow==2.9.2"
            ],
            "maven": [
              {"coordinates":"com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.1"},
              {"coordinates":"mx.com.sw:sdk-java18:0.0.1.5"}
            ]
        },
    ],
    "name": "bert_env",
}

Please help!

Shreyash · ‎04-17-2024

Hey Kaniz,

Thank you for that response. Although I passed in the jars via the conf as mentioned above. I tried passing it in the cluster conf as well. I also checked the version compatibility and it seems to be fine. Still does not work.

Shreyash · ‎04-18-2024

Thanks for the reply Kaniz. I was able to recrete the model locally and it worked when I gave it the right jars using spark.config. The catch is that I am trying to do this in mlflow and I have no way or specifying this explicitly there. How can I give these jars in mlflow ?

Rajora · ‎05-15-2024

I'm having the same problem and have tried various solutions with no luck. I found some potentially relevant information on the following link: https://www.johnsnowlabs.com/serving-spark-nlp-via-api-3-3-databricks-jobs-and-mlflow-serve-apis/

In the link I found the following answer:

IMPORTANT: As of 17/02/2022, there is an issue being studied by the Databricks team, regarding the creation on the fly of job clusters to serve MLFlow models that require configuring the Spark Session with specific jars. This will be fixed in later versions of Databricks. In the meantime, the way to go is using Databricks Jobs API.

Has this already been resolved? Would it be possible to have a hands on task to show how to solve this?