cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Unable to Infer Spark ML Pipeline model when built using Custom Preprocessing Stages

416412
New Contributor

We are trying to build an internal use case based on PySpark. The data we have requires a lot of pre-processing. Hence, to cater to that we have used custom Spark ML pipeline stages as some of the transformations that need to be done on our data aren't available in the pyspark.ml module. These custom pre-processing stages were extending the Estimators, HasInput, HasOutput, MLWritable and MLReadable classes i.e.,

 

from pyspark.ml.pipeline import Transformer, Estimator
from pyspark.ml.param.shared import HasInputCol, HasOutputCol

 

 
We were able to tune it using hyperOpt and train-evaluate on the whole data. We also logged the model within MLflow. However, when we tried to load the pipeline model for inferring, it was failing due to the custom stages' __init__() method. We are not able to understand why upon loading the model the constructor method is called even if the class variables were already fitted within the object during the training (fitting) phase.

Here's some part of the custom transformer, which is having issues:
code_snipped.png
 
Here's the screenshot of the error we are facing:
Screenshot 2023-07-14 165821.png
 
If there's anyone who has worked on this kind of development. Please help! It would be great if someone can share some working examples to do that.
0 REPLIES 0
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.