<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Unable to Infer Spark ML Pipeline model when built using Custom Preprocessing Stages in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/unable-to-infer-spark-ml-pipeline-model-when-built-using-custom/m-p/37965#M1970</link>
    <description>&lt;P&gt;&lt;SPAN&gt;We are trying to build an internal use case based on PySpark. The data we have&amp;nbsp;requires a lot of pre-processing. Hence, to cater to that we have used custom Spark ML pipeline stages as some of the transformations that need to be done on our data aren't available in the&amp;nbsp;&lt;STRONG&gt;pyspark.ml&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;module. These custom pre-processing stages were&amp;nbsp;extending the Estimators, HasInput, HasOutput, MLWritable and MLReadable classes i.e.,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;from pyspark.ml.pipeline import Transformer, Estimator
from pyspark.ml.param.shared import HasInputCol, HasOutputCol&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;We were&amp;nbsp;able to tune it using&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;hyperOpt&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;and train-evaluate on the whole data. We also logged the model within MLflow.&amp;nbsp;However, when we tried to load the pipeline model for inferring, it was failing due to the custom stages'&amp;nbsp;&lt;EM&gt;&lt;STRONG&gt;__init__()&lt;/STRONG&gt;&lt;/EM&gt;&amp;nbsp;method. We are not able to understand why upon loading the model the constructor method is called even if the class variables were already fitted within the object during the training (fitting) phase.&lt;/DIV&gt;&lt;DIV&gt;&lt;BR /&gt;Here's some part of the custom transformer, which is having issues:&lt;/DIV&gt;&lt;DIV&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="code_snipped.png" style="width: 999px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2894i0862054005A3BB6A/image-size/large/is-moderation-mode/true?v=v2&amp;amp;px=999" role="button" title="code_snipped.png" alt="code_snipped.png" /&gt;&lt;/span&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Here's the screenshot of the error we are facing:&lt;/DIV&gt;&lt;DIV&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screenshot 2023-07-14 165821.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2892iF82A48B10E9E4EF8/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="Screenshot 2023-07-14 165821.png" alt="Screenshot 2023-07-14 165821.png" /&gt;&lt;/span&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;If there's anyone who has worked on this kind of development. &lt;/SPAN&gt;&lt;SPAN&gt;Please help! It would be great if someone can share some working examples to do that.&lt;/SPAN&gt;&lt;/DIV&gt;</description>
    <pubDate>Wed, 19 Jul 2023 15:51:50 GMT</pubDate>
    <dc:creator>416412</dc:creator>
    <dc:date>2023-07-19T15:51:50Z</dc:date>
    <item>
      <title>Unable to Infer Spark ML Pipeline model when built using Custom Preprocessing Stages</title>
      <link>https://community.databricks.com/t5/machine-learning/unable-to-infer-spark-ml-pipeline-model-when-built-using-custom/m-p/37965#M1970</link>
      <description>&lt;P&gt;&lt;SPAN&gt;We are trying to build an internal use case based on PySpark. The data we have&amp;nbsp;requires a lot of pre-processing. Hence, to cater to that we have used custom Spark ML pipeline stages as some of the transformations that need to be done on our data aren't available in the&amp;nbsp;&lt;STRONG&gt;pyspark.ml&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;module. These custom pre-processing stages were&amp;nbsp;extending the Estimators, HasInput, HasOutput, MLWritable and MLReadable classes i.e.,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;from pyspark.ml.pipeline import Transformer, Estimator
from pyspark.ml.param.shared import HasInputCol, HasOutputCol&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;We were&amp;nbsp;able to tune it using&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;hyperOpt&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;and train-evaluate on the whole data. We also logged the model within MLflow.&amp;nbsp;However, when we tried to load the pipeline model for inferring, it was failing due to the custom stages'&amp;nbsp;&lt;EM&gt;&lt;STRONG&gt;__init__()&lt;/STRONG&gt;&lt;/EM&gt;&amp;nbsp;method. We are not able to understand why upon loading the model the constructor method is called even if the class variables were already fitted within the object during the training (fitting) phase.&lt;/DIV&gt;&lt;DIV&gt;&lt;BR /&gt;Here's some part of the custom transformer, which is having issues:&lt;/DIV&gt;&lt;DIV&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="code_snipped.png" style="width: 999px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2894i0862054005A3BB6A/image-size/large/is-moderation-mode/true?v=v2&amp;amp;px=999" role="button" title="code_snipped.png" alt="code_snipped.png" /&gt;&lt;/span&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Here's the screenshot of the error we are facing:&lt;/DIV&gt;&lt;DIV&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screenshot 2023-07-14 165821.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2892iF82A48B10E9E4EF8/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="Screenshot 2023-07-14 165821.png" alt="Screenshot 2023-07-14 165821.png" /&gt;&lt;/span&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;If there's anyone who has worked on this kind of development. &lt;/SPAN&gt;&lt;SPAN&gt;Please help! It would be great if someone can share some working examples to do that.&lt;/SPAN&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 19 Jul 2023 15:51:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/unable-to-infer-spark-ml-pipeline-model-when-built-using-custom/m-p/37965#M1970</guid>
      <dc:creator>416412</dc:creator>
      <dc:date>2023-07-19T15:51:50Z</dc:date>
    </item>
  </channel>
</rss>

