<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Asynchronous API calls from Databricks Workflow job in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/asynchronous-api-calls-from-databricks-workflow-job/m-p/70007#M33964</link>
    <description>&lt;P&gt;Would you mind sharing the cluster setup for both cases? I'd make sure that databricks Runtime is the same for both and check the number of workers allocated in each cluster.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 20 May 2024 16:52:33 GMT</pubDate>
    <dc:creator>mhiltner</dc:creator>
    <dc:date>2024-05-20T16:52:33Z</dc:date>
    <item>
      <title>Asynchronous API calls from Databricks Workflow job</title>
      <link>https://community.databricks.com/t5/data-engineering/asynchronous-api-calls-from-databricks-workflow-job/m-p/68754#M33736</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;I have many API calls to run on a python Databricks notebook which I then run regularly on a Databricks Workflow job. When I test the following code on an all purpose cluster locally i.e. not via a job, it runs perfectly fine. However, when I run the same notebook on a job it does not work anymore. The calls are run sequentially instead of in parallel. Does anyone know why and what I can do to fix it?&lt;/P&gt;&lt;P&gt;Thank you!&lt;/P&gt;&lt;P&gt;Here is my code:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; asyncio&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; requests&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; nest_asyncio&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;nest_asyncio.&lt;/SPAN&gt;&lt;SPAN&gt;apply&lt;/SPAN&gt;&lt;SPAN&gt;()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;async&lt;/SPAN&gt; &lt;SPAN&gt;def&lt;/SPAN&gt; &lt;SPAN&gt;with_threads&lt;/SPAN&gt;&lt;SPAN&gt;():&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;def&lt;/SPAN&gt; &lt;SPAN&gt;make_request&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;response &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; requests.&lt;/SPAN&gt;&lt;SPAN&gt;get&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;A href="https://www.google.com" target="_blank" rel="noopener"&gt;https://www.google.com&lt;/A&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;return&lt;/SPAN&gt;&lt;SPAN&gt; response&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;reqs &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; [asyncio.&lt;/SPAN&gt;&lt;SPAN&gt;to_thread&lt;/SPAN&gt;&lt;SPAN&gt;(make_request) &lt;/SPAN&gt;&lt;SPAN&gt;for&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;_&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;in&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;range(0,20)]&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;responses &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;await&lt;/SPAN&gt;&lt;SPAN&gt; asyncio.&lt;/SPAN&gt;&lt;SPAN&gt;gather&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;reqs)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;return&lt;/SPAN&gt;&lt;SPAN&gt; responses&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;async_result &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; asyncio.&lt;/SPAN&gt;&lt;SPAN&gt;run&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;with_threads&lt;/SPAN&gt;&lt;SPAN&gt;())&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;PS: The request and loop is different in my original code and only used here to explain the problem.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 10 May 2024 15:55:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/asynchronous-api-calls-from-databricks-workflow-job/m-p/68754#M33736</guid>
      <dc:creator>pjv</dc:creator>
      <dc:date>2024-05-10T15:55:58Z</dc:date>
    </item>
    <item>
      <title>Re: Asynchronous API calls from Databricks Workflow job</title>
      <link>https://community.databricks.com/t5/data-engineering/asynchronous-api-calls-from-databricks-workflow-job/m-p/70007#M33964</link>
      <description>&lt;P&gt;Would you mind sharing the cluster setup for both cases? I'd make sure that databricks Runtime is the same for both and check the number of workers allocated in each cluster.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 20 May 2024 16:52:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/asynchronous-api-calls-from-databricks-workflow-job/m-p/70007#M33964</guid>
      <dc:creator>mhiltner</dc:creator>
      <dc:date>2024-05-20T16:52:33Z</dc:date>
    </item>
    <item>
      <title>Re: Asynchronous API calls from Databricks Workflow job</title>
      <link>https://community.databricks.com/t5/data-engineering/asynchronous-api-calls-from-databricks-workflow-job/m-p/70109#M33993</link>
      <description>&lt;P&gt;I actually got it too work though I do see that if I run two jobs of the same code in parallel the async execution time slows down. Do the number of workers of the cluster on which the parallel jobs are run effect the execution time of async calls of the jobs?&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Here is the code that I got to run:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;# Asynchronous function to fetch data from a given URL using aiohttp&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;async&lt;/SPAN&gt; &lt;SPAN&gt;def&lt;/SPAN&gt; &lt;SPAN&gt;fetch_data&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;session&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;url&lt;/SPAN&gt;&lt;SPAN&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;async&lt;/SPAN&gt; &lt;SPAN&gt;with&lt;/SPAN&gt; &lt;SPAN&gt;session&lt;/SPAN&gt;&lt;SPAN&gt;.get(&lt;/SPAN&gt;&lt;SPAN&gt;url&lt;/SPAN&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;SPAN&gt;as&lt;/SPAN&gt; &lt;SPAN&gt;response&lt;/SPAN&gt;&lt;SPAN&gt;:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;return&lt;/SPAN&gt; &lt;SPAN&gt;await&lt;/SPAN&gt; &lt;SPAN&gt;response&lt;/SPAN&gt;&lt;SPAN&gt;.json()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;# Asynchronous main function&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;async&lt;/SPAN&gt; &lt;SPAN&gt;def&lt;/SPAN&gt; &lt;SPAN&gt;get_url_data&lt;/SPAN&gt;&lt;SPAN&gt;(input_args&lt;/SPAN&gt;&lt;SPAN&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;# List of URLs to fetch data from&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;urls&lt;/SPAN&gt; &lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; [&lt;/SPAN&gt;&lt;SPAN&gt;get_api_url&lt;/SPAN&gt;&lt;SPAN&gt;(input_arg&lt;/SPAN&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;SPAN&gt;for&lt;/SPAN&gt;&amp;nbsp;input_arg&amp;nbsp;&lt;SPAN&gt;in&lt;/SPAN&gt; &lt;SPAN&gt;input_args&lt;/SPAN&gt;&lt;SPAN&gt;]&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;headers&lt;/SPAN&gt; &lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; {&lt;/SPAN&gt;&lt;SPAN&gt;'X-API-KEY'&lt;/SPAN&gt;&lt;SPAN&gt;: "&amp;lt;API_KEY&amp;gt;"}&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;# Create an aiohttp ClientSession for making asynchronous HTTP requests&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;async&lt;/SPAN&gt; &lt;SPAN&gt;with&lt;/SPAN&gt; &lt;SPAN&gt;aiohttp&lt;/SPAN&gt;&lt;SPAN&gt;.ClientSession(&lt;/SPAN&gt;&lt;SPAN&gt;headers&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;headers&lt;/SPAN&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;SPAN&gt;as&lt;/SPAN&gt; &lt;SPAN&gt;session&lt;/SPAN&gt;&lt;SPAN&gt;:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;# Create a list of tasks, where each task is a call to 'fetch_data' with a specific URL&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;tasks&lt;/SPAN&gt; &lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; [&lt;/SPAN&gt;&lt;SPAN&gt;fetch_data&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;session&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;url&lt;/SPAN&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;SPAN&gt;for&lt;/SPAN&gt; &lt;SPAN&gt;url&lt;/SPAN&gt; &lt;SPAN&gt;in&lt;/SPAN&gt; &lt;SPAN&gt;urls&lt;/SPAN&gt;&lt;SPAN&gt;]&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;# Use 'asyncio.gather()' to run the tasks concurrently and gather their results&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;results&lt;/SPAN&gt; &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;await&lt;/SPAN&gt; &lt;SPAN&gt;asyncio&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;gather&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;&lt;SPAN&gt;tasks&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;return_exceptions&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;False&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;# Print the results obtained from fetching data from each URL&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;return&lt;/SPAN&gt; &lt;SPAN&gt;results&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 21 May 2024 12:34:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/asynchronous-api-calls-from-databricks-workflow-job/m-p/70109#M33993</guid>
      <dc:creator>pjv</dc:creator>
      <dc:date>2024-05-21T12:34:01Z</dc:date>
    </item>
  </channel>
</rss>

