<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Pyspark custom Transformer class -AttributeError: 'DummyMod' object has no attribute 'MyTransformer' in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/pyspark-custom-transformer-class-attributeerror-dummymod-object/m-p/75669#M3386</link>
    <description>&lt;P&gt;I am trying to create a custom transformer as a stage in my pipeline. A few of the transformations I am doing via SparkNLP and the next few using MLlib. To pass the result of SparkNLP transformation at a stage to the next MLlib transformation, I need to extract the spark_nlp_col.result column and pass it, and I am using a custom transformation stage for that.&lt;BR /&gt;After I fit my pipeline, I am able to persist it but when I am loading it back again, I am getting an error:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;AttributeError: 'DummyMod' object has no attribute 'MyTransformer'&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Here is my class:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;from pyspark.ml import Transformer
from pyspark.ml.param.shared import Param,Params,TypeConverters

class MyTransformer(Transformer,DefaultParamsWritable,DefaultParamsReadable):
    inputCol = Param(Params._dummy(), "inputCol", "",TypeConverters.toString)
    outputCol = Param(Params._dummy(), "outputCol", "",TypeConverters.toString)

    def __init__(self,inputCol=None,outputCol=None):
        super(MyTransformer, self).__init__()
        self._setDefault(inputCol=None)
        self._set(inputCol = inputCol)
        self._setDefault(outputCol=None)
        self._set(outputCol = outputCol)

    def getInputCol(self):
        return self.getOrDefault(self.inputCol)

    def setInputCol(self, inputCol):
        self._set(inputCol=inputCol)

    def getOutputCol(self):
        return self.getOrDefault(self.outputCol)

    def setOutputCol(self, outputCol):
        self._set(outputCol=outputCol)

    def _transform(self, dataset):
        in_col = self.getInputCol()
        out_col = self.getOutputCol()

        final_in_col = in_col+".result"
        result = dataset.withColumn(out_col, dataset[final_in_col])
        return result&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have created a simple wrapper function over it for standardisation and then used it to create pipeline, fit and save it:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;def extract_col(cols, in_suffix, out_suffix):
   return [MyTransformer(inputCol=col+in_suff, outputCol=col+out_suffix) for col in cols]&lt;/LI-CODE&gt;&lt;LI-CODE lang="python"&gt;'''
stages before custom transformer
'''
extractors = extract_col(cols, "_in", "_out")
'''
stages after custom transformer
'''

stages = s1 + s2 + .. + extractors + .. + sn-1 + sn
pipeline = Pipeline(stages = stages)
fit_pipeline = pipeline.fit(data)
fit_pipeline.write().overwrite().save(path_to_store_at)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;How I am reading it back:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;saved_pipeline = PipelinModel.load("path_where_stored")&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And then I am encountering the error.&lt;BR /&gt;I have tried multiple ways of writing the custom class, using HasInputCol, HasOutputCol, etc, nothing working so far. Any idea on how I can resolve it?&lt;/P&gt;</description>
    <pubDate>Tue, 25 Jun 2024 07:31:49 GMT</pubDate>
    <dc:creator>simranisanewbie</dc:creator>
    <dc:date>2024-06-25T07:31:49Z</dc:date>
    <item>
      <title>Pyspark custom Transformer class -AttributeError: 'DummyMod' object has no attribute 'MyTransformer'</title>
      <link>https://community.databricks.com/t5/machine-learning/pyspark-custom-transformer-class-attributeerror-dummymod-object/m-p/75669#M3386</link>
      <description>&lt;P&gt;I am trying to create a custom transformer as a stage in my pipeline. A few of the transformations I am doing via SparkNLP and the next few using MLlib. To pass the result of SparkNLP transformation at a stage to the next MLlib transformation, I need to extract the spark_nlp_col.result column and pass it, and I am using a custom transformation stage for that.&lt;BR /&gt;After I fit my pipeline, I am able to persist it but when I am loading it back again, I am getting an error:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;AttributeError: 'DummyMod' object has no attribute 'MyTransformer'&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Here is my class:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;from pyspark.ml import Transformer
from pyspark.ml.param.shared import Param,Params,TypeConverters

class MyTransformer(Transformer,DefaultParamsWritable,DefaultParamsReadable):
    inputCol = Param(Params._dummy(), "inputCol", "",TypeConverters.toString)
    outputCol = Param(Params._dummy(), "outputCol", "",TypeConverters.toString)

    def __init__(self,inputCol=None,outputCol=None):
        super(MyTransformer, self).__init__()
        self._setDefault(inputCol=None)
        self._set(inputCol = inputCol)
        self._setDefault(outputCol=None)
        self._set(outputCol = outputCol)

    def getInputCol(self):
        return self.getOrDefault(self.inputCol)

    def setInputCol(self, inputCol):
        self._set(inputCol=inputCol)

    def getOutputCol(self):
        return self.getOrDefault(self.outputCol)

    def setOutputCol(self, outputCol):
        self._set(outputCol=outputCol)

    def _transform(self, dataset):
        in_col = self.getInputCol()
        out_col = self.getOutputCol()

        final_in_col = in_col+".result"
        result = dataset.withColumn(out_col, dataset[final_in_col])
        return result&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have created a simple wrapper function over it for standardisation and then used it to create pipeline, fit and save it:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;def extract_col(cols, in_suffix, out_suffix):
   return [MyTransformer(inputCol=col+in_suff, outputCol=col+out_suffix) for col in cols]&lt;/LI-CODE&gt;&lt;LI-CODE lang="python"&gt;'''
stages before custom transformer
'''
extractors = extract_col(cols, "_in", "_out")
'''
stages after custom transformer
'''

stages = s1 + s2 + .. + extractors + .. + sn-1 + sn
pipeline = Pipeline(stages = stages)
fit_pipeline = pipeline.fit(data)
fit_pipeline.write().overwrite().save(path_to_store_at)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;How I am reading it back:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;saved_pipeline = PipelinModel.load("path_where_stored")&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And then I am encountering the error.&lt;BR /&gt;I have tried multiple ways of writing the custom class, using HasInputCol, HasOutputCol, etc, nothing working so far. Any idea on how I can resolve it?&lt;/P&gt;</description>
      <pubDate>Tue, 25 Jun 2024 07:31:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/pyspark-custom-transformer-class-attributeerror-dummymod-object/m-p/75669#M3386</guid>
      <dc:creator>simranisanewbie</dc:creator>
      <dc:date>2024-06-25T07:31:49Z</dc:date>
    </item>
  </channel>
</rss>

