cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Asynchronous API calls from Databricks

Paul_Poco
New Contributor II

Hi,

I have to send thousands of API calls from a Databricks notebook to an API to retrieve some data.

Right now, I am using a sequential approach using the python request package. As the performance is not acceptable anymore, I need to send my API calls in parallel. I started changing my ingestion notebook, using the asyncio and aiohttp packages.

I am not an expert in these 2 packages but I keep having one error that I cannot expain:

asyncio.run() cannot be called from a running event loop

which means that there is already an event loop but there shouldn't be an existing event loop!

When I run

loop = asyncio.get_running_loop()

I get a running loop, even outside the main program. I know that Jupyter notebooks always have a running event loop, is it the same for Databricks?

Does anyone have some experience with these 2 packages in Databricks?

Is there a better way to handle asynchronous HTTP call from Databricks?

5 REPLIES 5

cascode
New Contributor II

You could try adding this:

import nest_asyncio

nest_asyncio.apply()

And in your code calling it like this:

asyncio.run(your_method())

Don't forget to install the nest_asyncio package.

SCWD
New Contributor III

This worked for me - thanks, wasn't aware of this package.

FYI, nest_asyncio is included in the Databricks runtime by default since 10.4

SCWD
New Contributor III

I also ran in to this error and cascode's response resolved it.

Of note, this error didn't pop for me when running the code on an all-purpose cluster, only on a new job cluster.

Anonymous
Not applicable

Hi @Paul Poco​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

adarsh8304
New Contributor II

Hey @Paul_Poco what about using the processpoolexecutor or threadypoolexecutor from the concurrent.futures module ? have u tried them or not . ? 

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group