04-16-2024 11:49 AM
I am trying to serve a pyspark model using an endpoint. I was able to load and register the model normally. I could also load that model and perform inference but while serving the model, I am getting the following error:
[94fffqts54] ERROR StatusLogger Reconfiguration failed: No configuration found for 'Default' at 'null' in 'null'
[94fffqts54] ERROR StatusLogger Reconfiguration failed: No configuration found for '5ffd2b27' at 'null' in 'null'
[94fffqts54] ERROR StatusLogger Reconfiguration failed: No configuration found for 'Default' at 'null' in 'null'
[94fffqts54] An error occurred while loading the model. An error occurred while calling o63.load.
[94fffqts54] : java.lang.ClassNotFoundException: com.johnsnowlabs.nlp.DocumentAssembler
[94fffqts54] at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
My conf file looks like this:
conda_env_conf = {
"channels": ["defaults"],
"dependencies": [
"python=3.9.5",
"pip",
{
"pip": [
"spark-nlp==5.3.1",
"pyspark==3.3.2",
"mlflow==2.9.2"
],
"maven": [
{"coordinates":"com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.1"},
{"coordinates":"mx.com.sw:sdk-java18:0.0.1.5"}
]
},
],
"name": "bert_env",
}
Please help!
04-17-2024 04:09 AM - edited 04-17-2024 04:09 AM
Hi @Shreyash, It looks like your code is encountering a java.lang.ClassNotFoundException
for the com.johnsnowlabs.nlp.DocumentAssembler
class while serving your PySpark model. This error occurs when the required class is not found in the classpath.
sparknlp.start()
. The JAR will be automatically downloaded.pyspark
command using the --jars
switch. You can download the JAR manually from the releases page.pyspark
and pass --packages. For example :-
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.7.5
Make sure to choose the version you need.
spark.conf.set("spark.executor.extraClassPath", "/path/to/spark-nlp.jar")
04-17-2024 09:10 AM
Hey Kaniz,
Thank you for that response. Although I passed in the jars via the conf as mentioned above. I tried passing it in the cluster conf as well. I also checked the version compatibility and it seems to be fine. Still does not work.
04-17-2024 11:17 PM
Hi @Shreyash,
--driver-class-path
and --conf spark.executor.extraClassPath
options when submitting your Spark job.SPARK_HOME
, PYSPARK_PYTHON
, etc.) are consistent across your local machine and the cluster.--conf spark.driver.extraJavaOptions="-Dlog4j.configuration=file:/path/to/log4j.properties"
).SparkContext.addPyFile()
or SparkSession.sparkContext.addJar()
.from pyspark import SparkContext
sc = SparkContext()
sc.addPyFile("/path/to/spark-nlp_2.12-5.3.1.jar")
04-18-2024 08:42 AM
Thanks for the reply Kaniz. I was able to recrete the model locally and it worked when I gave it the right jars using spark.config. The catch is that I am trying to do this in mlflow and I have no way or specifying this explicitly there. How can I give these jars in mlflow ?
05-15-2024 08:10 PM
I'm having the same problem and have tried various solutions with no luck. I found some potentially relevant information on the following link: https://www.johnsnowlabs.com/serving-spark-nlp-via-api-3-3-databricks-jobs-and-mlflow-serve-apis/
In the link I found the following answer:
IMPORTANT: As of 17/02/2022, there is an issue being studied by the Databricks team, regarding the creation on the fly of job clusters to serve MLFlow models that require configuring the Spark Session with specific jars. This will be fixed in later versions of Databricks. In the meantime, the way to go is using Databricks Jobs API.
Has this already been resolved? Would it be possible to have a hands on task to show how to solve this?
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group