Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Asynchronous API calls from Databricks

I have to send thousands of API calls from a Databricks notebook to an API to retrieve some data.

Right now, I am using a sequential approach using the python request package. As the performance is not acceptable anymore, I need to send my API calls in parallel. I started changing my ingestion notebook, using the asyncio and aiohttp packages.

I am not an expert in these 2 packages but I keep having one error that I cannot expain: cannot be called from a running event loop

which means that there is already an event loop but there shouldn't be an existing event loop!

When I run

loop = asyncio.get_running_loop()

I get a running loop, even outside the main program. I know that Jupyter notebooks always have a running event loop, is it the same for Databricks?

Does anyone have some experience with these 2 packages in Databricks?

Is there a better way to handle asynchronous HTTP call from Databricks?


You could try adding this:

import nest_asyncio


And in your code calling it like this:

Don't forget to install the nest_asyncio package.

This worked for me - thanks, wasn't aware of this package.

FYI, nest_asyncio is included in the Databricks runtime by default since 10.4

I also ran in to this error and cascode's response resolved it.

Of note, this error didn't pop for me when running the code on an all-purpose cluster, only on a new job cluster.

