<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Performance issue while calling mlflow endpoint in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/performance-issue-while-calling-mlflow-endpoint/m-p/58822#M6824</link>
    <description>&lt;P&gt;Thank you Kaniz for the suggestions. This is really helpful. I even tried using applyInPandas. Not sure if this is better than spark UDF. If not can you help me in converting this function to pandas udf or any other optimized function.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;def myfunc(input_text):&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp;restult = mlflowmodel.predict(input_text)&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp;return result&lt;/DIV&gt;&lt;DIV&gt;def myfuncUDF(pdf):&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; pdf['test_result']=pdf["input_text"].apply(myfunc)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; return pdf&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;df = spark.sql("select * from test")&lt;/DIV&gt;&lt;DIV&gt;df = df.groupBy("id").applyInPandas(myfuncUDF)&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Regards,&lt;/DIV&gt;&lt;DIV&gt;Sanjay&lt;/DIV&gt;</description>
    <pubDate>Wed, 31 Jan 2024 12:28:29 GMT</pubDate>
    <dc:creator>sanjay</dc:creator>
    <dc:date>2024-01-31T12:28:29Z</dc:date>
    <item>
      <title>Performance issue while calling mlflow endpoint</title>
      <link>https://community.databricks.com/t5/get-started-discussions/performance-issue-while-calling-mlflow-endpoint/m-p/58558#M6822</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I have pyspark dataframe and pyspark udf which calls mlflow model for each row but its performance is too slow.&lt;/P&gt;&lt;P&gt;Here is sample code&lt;/P&gt;&lt;P&gt;def myfunc(input_text):&lt;BR /&gt;&amp;nbsp; &amp;nbsp;restult = mlflowmodel.predict(input_text)&lt;BR /&gt;&amp;nbsp; &amp;nbsp;return result&lt;/P&gt;&lt;P&gt;myfuncUDF = udf(myfunc,StringType())&lt;/P&gt;&lt;P&gt;df = spark.sql("select * from test")&lt;BR /&gt;df=df.withColumn("test_result",myfuncUDF("input_text"))&lt;/P&gt;&lt;P&gt;Please suggest how to improve the performance.&lt;/P&gt;&lt;P&gt;Regards,&lt;BR /&gt;Sanjay&lt;/P&gt;</description>
      <pubDate>Sun, 28 Jan 2024 19:24:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/performance-issue-while-calling-mlflow-endpoint/m-p/58558#M6822</guid>
      <dc:creator>sanjay</dc:creator>
      <dc:date>2024-01-28T19:24:38Z</dc:date>
    </item>
    <item>
      <title>Re: Performance issue while calling mlflow endpoint</title>
      <link>https://community.databricks.com/t5/get-started-discussions/performance-issue-while-calling-mlflow-endpoint/m-p/58822#M6824</link>
      <description>&lt;P&gt;Thank you Kaniz for the suggestions. This is really helpful. I even tried using applyInPandas. Not sure if this is better than spark UDF. If not can you help me in converting this function to pandas udf or any other optimized function.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;def myfunc(input_text):&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp;restult = mlflowmodel.predict(input_text)&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp; &amp;nbsp;return result&lt;/DIV&gt;&lt;DIV&gt;def myfuncUDF(pdf):&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; pdf['test_result']=pdf["input_text"].apply(myfunc)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; return pdf&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;df = spark.sql("select * from test")&lt;/DIV&gt;&lt;DIV&gt;df = df.groupBy("id").applyInPandas(myfuncUDF)&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Regards,&lt;/DIV&gt;&lt;DIV&gt;Sanjay&lt;/DIV&gt;</description>
      <pubDate>Wed, 31 Jan 2024 12:28:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/performance-issue-while-calling-mlflow-endpoint/m-p/58822#M6824</guid>
      <dc:creator>sanjay</dc:creator>
      <dc:date>2024-01-31T12:28:29Z</dc:date>
    </item>
    <item>
      <title>Re: Performance issue while calling mlflow endpoint</title>
      <link>https://community.databricks.com/t5/get-started-discussions/performance-issue-while-calling-mlflow-endpoint/m-p/59560#M6826</link>
      <description>&lt;P&gt;Thank you&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;, its really helpful and did worked. Another quick question, I have to pass 2 parameters as input to&amp;nbsp;&lt;SPAN&gt;myfunc. Please help how to pass multiple parameters.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;def myfunc(input_text, param2):&lt;BR /&gt;&amp;nbsp; &amp;nbsp;# Assuming mlflowmodel is defined elsewhere&lt;BR /&gt;&amp;nbsp; &amp;nbsp;result = mlflowmodel.predict(input_text, param2)&lt;BR /&gt;&amp;nbsp; &amp;nbsp;return result&lt;/P&gt;&lt;P&gt;# Create a Pandas UDF&lt;BR /&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/54169"&gt;@pandas&lt;/a&gt;_udf(StringType())&lt;BR /&gt;def myfunc_udf(input_text_series: pd.Series, param2_series: pd.Series) -&amp;gt; pd.Series:&lt;BR /&gt;&amp;nbsp; &amp;nbsp;return input_text_series.apply(myfunc) ??&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 07 Feb 2024 07:36:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/performance-issue-while-calling-mlflow-endpoint/m-p/59560#M6826</guid>
      <dc:creator>sanjay</dc:creator>
      <dc:date>2024-02-07T07:36:39Z</dc:date>
    </item>
    <item>
      <title>Re: Performance issue while calling mlflow endpoint</title>
      <link>https://community.databricks.com/t5/get-started-discussions/performance-issue-while-calling-mlflow-endpoint/m-p/59566#M6827</link>
      <description>&lt;P&gt;I need to send two arguments to myfunc, thus I have another brief question. I need some guidance on how to pass in many parameters.&lt;/P&gt;</description>
      <pubDate>Wed, 07 Feb 2024 09:40:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/performance-issue-while-calling-mlflow-endpoint/m-p/59566#M6827</guid>
      <dc:creator>Isabeente</dc:creator>
      <dc:date>2024-02-07T09:40:08Z</dc:date>
    </item>
    <item>
      <title>Re: Performance issue while calling mlflow endpoint</title>
      <link>https://community.databricks.com/t5/get-started-discussions/performance-issue-while-calling-mlflow-endpoint/m-p/59593#M6829</link>
      <description>&lt;P&gt;Hi Kaniz,&lt;/P&gt;&lt;P&gt;I started getting following error after using myfunc_udf with 2 parameters.&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;pythonException: 'ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Regards,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Sanjay&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 07 Feb 2024 13:05:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/performance-issue-while-calling-mlflow-endpoint/m-p/59593#M6829</guid>
      <dc:creator>sanjay</dc:creator>
      <dc:date>2024-02-07T13:05:39Z</dc:date>
    </item>
    <item>
      <title>Re: Performance issue while calling mlflow endpoint</title>
      <link>https://community.databricks.com/t5/get-started-discussions/performance-issue-while-calling-mlflow-endpoint/m-p/59603#M6830</link>
      <description>&lt;P&gt;Hello Sanjay,&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could you please share your code snippet as per latest changes?&lt;/P&gt;</description>
      <pubDate>Wed, 07 Feb 2024 14:12:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/performance-issue-while-calling-mlflow-endpoint/m-p/59603#M6830</guid>
      <dc:creator>BR_DatabricksAI</dc:creator>
      <dc:date>2024-02-07T14:12:19Z</dc:date>
    </item>
    <item>
      <title>Re: Performance issue while calling mlflow endpoint</title>
      <link>https://community.databricks.com/t5/get-started-discussions/performance-issue-while-calling-mlflow-endpoint/m-p/59606#M6831</link>
      <description>&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Hi,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Here is code snippet as per latest changes.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;def&lt;/SPAN&gt; &lt;SPAN&gt;myfunc&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;t1&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;t2)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;return&lt;/SPAN&gt; &lt;SPAN&gt;'test'&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;@&lt;/SPAN&gt;&lt;SPAN&gt;pandas_udf&lt;/SPAN&gt;&lt;SPAN&gt;(psf.&lt;/SPAN&gt;&lt;SPAN&gt;StringType&lt;/SPAN&gt;&lt;SPAN&gt;())&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;def&lt;/SPAN&gt; &lt;SPAN&gt;myfunc_udf&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;input_text_series&lt;/SPAN&gt;&lt;SPAN&gt;: pd.Series, &lt;/SPAN&gt;&lt;SPAN&gt;param2_series&lt;/SPAN&gt;&lt;SPAN&gt;: pd.Series) -&amp;gt; pd.Series:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;def&lt;/SPAN&gt; &lt;SPAN&gt;apply_myfunc&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;input_text&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;param2)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;return&lt;/SPAN&gt; &lt;SPAN&gt;myfunc&lt;/SPAN&gt;&lt;SPAN&gt;(input_text, param2)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;return&lt;/SPAN&gt;&lt;SPAN&gt; input_text_series.&lt;/SPAN&gt;&lt;SPAN&gt;apply&lt;/SPAN&gt;&lt;SPAN&gt;(apply_myfunc, param2_series)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;df.withColumn("result", myfunc_udf("input_text1", "input_text2"))&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;But&amp;nbsp;&lt;SPAN&gt;I am getting error while running this&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;P&gt;&lt;SPAN&gt;pythonException: 'ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()&lt;/SPAN&gt;&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 07 Feb 2024 15:39:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/performance-issue-while-calling-mlflow-endpoint/m-p/59606#M6831</guid>
      <dc:creator>sanjay</dc:creator>
      <dc:date>2024-02-07T15:39:45Z</dc:date>
    </item>
    <item>
      <title>Re: Performance issue while calling mlflow endpoint</title>
      <link>https://community.databricks.com/t5/get-started-discussions/performance-issue-while-calling-mlflow-endpoint/m-p/59639#M6832</link>
      <description>&lt;P&gt;Hello Sanjay,&lt;/P&gt;&lt;P&gt;The above code don't have the df defined. Can you share your df.show() output.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 08 Feb 2024 05:25:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/performance-issue-while-calling-mlflow-endpoint/m-p/59639#M6832</guid>
      <dc:creator>BR_DatabricksAI</dc:creator>
      <dc:date>2024-02-08T05:25:35Z</dc:date>
    </item>
    <item>
      <title>Re: Performance issue while calling mlflow endpoint</title>
      <link>https://community.databricks.com/t5/get-started-discussions/performance-issue-while-calling-mlflow-endpoint/m-p/60208#M6833</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;Appreciate if you can help in resolving this issue.&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Sanjay&lt;/P&gt;</description>
      <pubDate>Wed, 14 Feb 2024 14:42:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/performance-issue-while-calling-mlflow-endpoint/m-p/60208#M6833</guid>
      <dc:creator>sanjay</dc:creator>
      <dc:date>2024-02-14T14:42:28Z</dc:date>
    </item>
    <item>
      <title>Re: Performance issue while calling mlflow endpoint</title>
      <link>https://community.databricks.com/t5/get-started-discussions/performance-issue-while-calling-mlflow-endpoint/m-p/64042#M6834</link>
      <description>&lt;P&gt;So good&lt;/P&gt;</description>
      <pubDate>Tue, 19 Mar 2024 02:19:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/performance-issue-while-calling-mlflow-endpoint/m-p/64042#M6834</guid>
      <dc:creator>Isabeente</dc:creator>
      <dc:date>2024-03-19T02:19:56Z</dc:date>
    </item>
  </channel>
</rss>

