<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Are UDFs necessary for applying models from ML libraries at scale ? in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/are-udfs-necessary-for-applying-models-from-ml-libraries-at/m-p/10740#M529</link>
    <description>&lt;P&gt;MlLib is in the maintenance model and udf is not used by creating model in most cases&lt;/P&gt;</description>
    <pubDate>Wed, 08 Feb 2023 19:17:49 GMT</pubDate>
    <dc:creator>Manoj12421</dc:creator>
    <dc:date>2023-02-08T19:17:49Z</dc:date>
    <item>
      <title>Are UDFs necessary for applying models from ML libraries at scale ?</title>
      <link>https://community.databricks.com/t5/machine-learning/are-udfs-necessary-for-applying-models-from-ml-libraries-at/m-p/10737#M526</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I recently finished the "scalable machine learning with apache spark" course and saw that SKLearn models could be applied faster in a distributed manner when used in pandas UDFs or with mapInPandas() method. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Spark MLlib models don't need this kind of refactoring since they are made for distributed executions but I was wondering if this kind of UDF was necessary for other libraries such as TensorFlow, PyTorch, SpaCy, Keras, etc.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you !&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jan 2023 21:14:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/are-udfs-necessary-for-applying-models-from-ml-libraries-at/m-p/10737#M526</guid>
      <dc:creator>anvil</dc:creator>
      <dc:date>2023-01-24T21:14:46Z</dc:date>
    </item>
    <item>
      <title>Re: Are UDFs necessary for applying models from ML libraries at scale ?</title>
      <link>https://community.databricks.com/t5/machine-learning/are-udfs-necessary-for-applying-models-from-ml-libraries-at/m-p/10738#M527</link>
      <description>&lt;P&gt;MlLib is in the maintenance model. Currently, Spark ML is used mainly. Creating the model is not using UDFs in most cases &lt;A href="https://www.databricks.com/spark/getting-started-with-apache-spark/machine-learning" target="test_blank"&gt;https://www.databricks.com/spark/getting-started-with-apache-spark/machine-learning&lt;/A&gt;, but anyway, UDF is usually run in a distributed way. For example, when you append data to your table, you can use UDF to run prediction using a registered model (even on a real-time stream):&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;import mlflow
predict = mlflow.pyfunc.spark_udf(spark, model_uri=f"runs:/{run_id}/model")
predDF = testDF.withColumn("prediction", predict(*testDF.columns))&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 25 Jan 2023 10:14:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/are-udfs-necessary-for-applying-models-from-ml-libraries-at/m-p/10738#M527</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2023-01-25T10:14:32Z</dc:date>
    </item>
    <item>
      <title>Re: Are UDFs necessary for applying models from ML libraries at scale ?</title>
      <link>https://community.databricks.com/t5/machine-learning/are-udfs-necessary-for-applying-models-from-ml-libraries-at/m-p/10739#M528</link>
      <description>&lt;P&gt;UDFs are not necessarily required for applying models from ML libraries at scale, but they can provide some benefits in terms of performance and ease of use.&lt;/P&gt;&lt;P&gt;When using other libraries such as TensorFlow, PyTorch, SpaCy, Keras, etc., they are not optimized for distributed processing by default. In this case, using UDFs or the mapInPandas() method can provide a way to scale the models efficiently, by parallelizing the processing across the Spark cluster.&lt;/P&gt;&lt;P&gt;Anyways, it ultimately depends on the specific requirements of your project.&lt;/P&gt;</description>
      <pubDate>Wed, 01 Feb 2023 07:44:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/are-udfs-necessary-for-applying-models-from-ml-libraries-at/m-p/10739#M528</guid>
      <dc:creator>Devarsh</dc:creator>
      <dc:date>2023-02-01T07:44:51Z</dc:date>
    </item>
    <item>
      <title>Re: Are UDFs necessary for applying models from ML libraries at scale ?</title>
      <link>https://community.databricks.com/t5/machine-learning/are-udfs-necessary-for-applying-models-from-ml-libraries-at/m-p/10740#M529</link>
      <description>&lt;P&gt;MlLib is in the maintenance model and udf is not used by creating model in most cases&lt;/P&gt;</description>
      <pubDate>Wed, 08 Feb 2023 19:17:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/are-udfs-necessary-for-applying-models-from-ml-libraries-at/m-p/10740#M529</guid>
      <dc:creator>Manoj12421</dc:creator>
      <dc:date>2023-02-08T19:17:49Z</dc:date>
    </item>
  </channel>
</rss>

