Asynchronous API calls from Databricks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-08-2023 09:43 AM
Hi,
I have to send thousands of API calls from a Databricks notebook to an API to retrieve some data.
Right now, I am using a sequential approach using the python request package. As the performance is not acceptable anymore, I need to send my API calls in parallel. I started changing my ingestion notebook, using the asyncio and aiohttp packages.
I am not an expert in these 2 packages but I keep having one error that I cannot expain:
asyncio.run() cannot be called from a running event loop
which means that there is already an event loop but there shouldn't be an existing event loop!
When I run
loop = asyncio.get_running_loop()
I get a running loop, even outside the main program. I know that Jupyter notebooks always have a running event loop, is it the same for Databricks?
Does anyone have some experience with these 2 packages in Databricks?
Is there a better way to handle asynchronous HTTP call from Databricks?
- Labels:
-
API
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-12-2023 09:08 AM
You could try adding this:
import nest_asyncio
nest_asyncio.apply()
And in your code calling it like this:
asyncio.run(your_method())
Don't forget to install the nest_asyncio package.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-17-2023 11:11 AM
This worked for me - thanks, wasn't aware of this package.
FYI, nest_asyncio is included in the Databricks runtime by default since 10.4
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-17-2023 11:13 AM
I also ran in to this error and cascode's response resolved it.
Of note, this error didn't pop for me when running the code on an all-purpose cluster, only on a new job cluster.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-19-2023 01:38 AM
Hi @Paul Poco
Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.
We'd love to hear from you.
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-25-2024 05:46 AM - edited 09-25-2024 05:47 AM
Hey @Paul_Poco what about using the processpoolexecutor or threadypoolexecutor from the concurrent.futures module ? have u tried them or not . ?