I have to send thousands of API calls from a Databricks notebook to an API to retrieve some data.
Right now, I am using a sequential approach using the python request package. As the performance is not acceptable anymore, I need to send my API calls in parallel. I started changing my ingestion notebook, using the asyncio and aiohttp packages.
I am not an expert in these 2 packages but I keep having one error that I cannot expain:
asyncio.run() cannot be called from a running event loop
which means that there is already an event loop but there shouldn't be an existing event loop!
When I run
loop = asyncio.get_running_loop()
I get a running loop, even outside the main program. I know that Jupyter notebooks always have a running event loop, is it the same for Databricks?
Does anyone have some experience with these 2 packages in Databricks?
Is there a better way to handle asynchronous HTTP call from Databricks?
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!