<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Limiting parallelism when external APIs are invoked (i.e. mlflow) in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/limiting-parallelism-when-external-apis-are-invoked-i-e-mlflow/m-p/31522#M22960</link>
    <description>&lt;P&gt;That's why I asked how to limit parallelism, not what a 429 error mean &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; That one I already know. We got an answer from our RA that we pay as professional customers, it looks like this community is pretty useless if the experts do not partecipate &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 24 Feb 2022 15:12:19 GMT</pubDate>
    <dc:creator>Edmondo</dc:creator>
    <dc:date>2022-02-24T15:12:19Z</dc:date>
    <item>
      <title>Limiting parallelism when external APIs are invoked (i.e. mlflow)</title>
      <link>https://community.databricks.com/t5/data-engineering/limiting-parallelism-when-external-apis-are-invoked-i-e-mlflow/m-p/31517#M22955</link>
      <description>&lt;P&gt;We are applying a groupby operation to a pyspark.sql.Dataframe and then on each group train a single model for mlflow. We see intermittent failures because the MLFlow server replies with a 429, because of too many requests/s&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What are the best practices in those cases, and how do you limit the outgoing invocations of an external service? We are using managed MLFlow in Databricks, is there a way that we can configure mlflow so that it queues subsequent requests before sending them to the server?&lt;/P&gt;</description>
      <pubDate>Fri, 21 Mar 2025 13:28:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/limiting-parallelism-when-external-apis-are-invoked-i-e-mlflow/m-p/31517#M22955</guid>
      <dc:creator>Edmondo</dc:creator>
      <dc:date>2025-03-21T13:28:53Z</dc:date>
    </item>
    <item>
      <title>Re: Limiting parallelism when external APIs are invoked (i.e. mlflow)</title>
      <link>https://community.databricks.com/t5/data-engineering/limiting-parallelism-when-external-apis-are-invoked-i-e-mlflow/m-p/31518#M22956</link>
      <description>&lt;P&gt;at least in Azure MLFlow limits are quite strict per workspace:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Low throughput experiment management (list, update, delete, restore): 7 qps&lt;/LI&gt;&lt;LI&gt;Search runs: 7 qps&lt;/LI&gt;&lt;LI&gt;Log batch: 47 qps&lt;/LI&gt;&lt;LI&gt;All other APIs: 127 qps&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;qps - queries per second. In addition, there is a limit of 20 concurrent model versions in Pending status (in creation) per workspace. Additionally 429 are automatically retry.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is models trained in parallel for every group? Maybe instead of parallel just train one group by one and monitor executor usage as anyway it can be close to 100% and can take the same time.&lt;/P&gt;</description>
      <pubDate>Mon, 17 Jan 2022 19:49:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/limiting-parallelism-when-external-apis-are-invoked-i-e-mlflow/m-p/31518#M22956</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2022-01-17T19:49:09Z</dc:date>
    </item>
    <item>
      <title>Re: Limiting parallelism when external APIs are invoked (i.e. mlflow)</title>
      <link>https://community.databricks.com/t5/data-engineering/limiting-parallelism-when-external-apis-are-invoked-i-e-mlflow/m-p/31519#M22957</link>
      <description>&lt;P&gt;Thanks, the limits are the same by documentation for AWS (I had checked that). So there are three options:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt; udf can be applied at maximum at 7 in parallel  (how do I do it?)&lt;/LI&gt;&lt;LI&gt;mlflow calls must be queued (again, how do I add a stateful queue across all cluster nodes?)&lt;/LI&gt;&lt;LI&gt;or I can use some sort of locking/coordination mechanism (is there anything active or I should set up a  Zookeeper instance?)&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 17 Jan 2022 19:52:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/limiting-parallelism-when-external-apis-are-invoked-i-e-mlflow/m-p/31519#M22957</guid>
      <dc:creator>Edmondo</dc:creator>
      <dc:date>2022-01-17T19:52:30Z</dc:date>
    </item>
    <item>
      <title>Re: Limiting parallelism when external APIs are invoked (i.e. mlflow)</title>
      <link>https://community.databricks.com/t5/data-engineering/limiting-parallelism-when-external-apis-are-invoked-i-e-mlflow/m-p/31520#M22958</link>
      <description>&lt;P&gt;@Edmondo Porcu​&amp;nbsp;- My name is Piper, and I'm a moderator for Databricks. I apologize for taking so long to respond. We are looking for the best person to help you.&lt;/P&gt;</description>
      <pubDate>Wed, 16 Feb 2022 17:02:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/limiting-parallelism-when-external-apis-are-invoked-i-e-mlflow/m-p/31520#M22958</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-02-16T17:02:32Z</dc:date>
    </item>
    <item>
      <title>Re: Limiting parallelism when external APIs are invoked (i.e. mlflow)</title>
      <link>https://community.databricks.com/t5/data-engineering/limiting-parallelism-when-external-apis-are-invoked-i-e-mlflow/m-p/31522#M22960</link>
      <description>&lt;P&gt;That's why I asked how to limit parallelism, not what a 429 error mean &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; That one I already know. We got an answer from our RA that we pay as professional customers, it looks like this community is pretty useless if the experts do not partecipate &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 24 Feb 2022 15:12:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/limiting-parallelism-when-external-apis-are-invoked-i-e-mlflow/m-p/31522#M22960</guid>
      <dc:creator>Edmondo</dc:creator>
      <dc:date>2022-02-24T15:12:19Z</dc:date>
    </item>
    <item>
      <title>Re: Limiting parallelism when external APIs are invoked (i.e. mlflow)</title>
      <link>https://community.databricks.com/t5/data-engineering/limiting-parallelism-when-external-apis-are-invoked-i-e-mlflow/m-p/31524#M22962</link>
      <description>&lt;P&gt;Yes. I confirm there is no sign of an answer to my question: "how to limit parallelism"&lt;/P&gt;</description>
      <pubDate>Thu, 24 Feb 2022 15:26:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/limiting-parallelism-when-external-apis-are-invoked-i-e-mlflow/m-p/31524#M22962</guid>
      <dc:creator>Edmondo</dc:creator>
      <dc:date>2022-02-24T15:26:09Z</dc:date>
    </item>
    <item>
      <title>Re: Limiting parallelism when external APIs are invoked (i.e. mlflow)</title>
      <link>https://community.databricks.com/t5/data-engineering/limiting-parallelism-when-external-apis-are-invoked-i-e-mlflow/m-p/31526#M22964</link>
      <description>&lt;P&gt;To me it's already resolved through professional services. The question I do have is how useful is this community if people with the right background aren't here, and if it takes a month to get a no-answer. &lt;/P&gt;</description>
      <pubDate>Thu, 24 Feb 2022 15:57:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/limiting-parallelism-when-external-apis-are-invoked-i-e-mlflow/m-p/31526#M22964</guid>
      <dc:creator>Edmondo</dc:creator>
      <dc:date>2022-02-24T15:57:12Z</dc:date>
    </item>
    <item>
      <title>Re: Limiting parallelism when external APIs are invoked (i.e. mlflow)</title>
      <link>https://community.databricks.com/t5/data-engineering/limiting-parallelism-when-external-apis-are-invoked-i-e-mlflow/m-p/31528#M22966</link>
      <description>&lt;P&gt;@Edmondo Porcu​&amp;nbsp;- Thank you for your feedback letting us know about your concerns. I apologize for you having to wait so long. We are working on our procedures to alleviate the situation. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks again!&lt;/P&gt;</description>
      <pubDate>Thu, 24 Feb 2022 16:35:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/limiting-parallelism-when-external-apis-are-invoked-i-e-mlflow/m-p/31528#M22966</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-02-24T16:35:47Z</dc:date>
    </item>
  </channel>
</rss>

