Databricks workflows for APIs with different frequencies (cluster keeps restarting)

mordex
New Contributor III
 
 

Title: Databricks workflows for APIs with different frequencies (cluster keeps restarting)

Hey everyone,

I’m stuck with a Databricks workflow design and could use some advice.

Currently, we are calling 70+ APIs 

Right now the workflow looks something like:
Task1 → Task2 → ForEach → notebook (API calls)

However, there is a new requirement that each API needs to be called at a different frequency — some must run every 1 min, some 2 mins, some 5 mins. And we have to create a generalized solution. 

In task 1 we are reading a view, where all API's, path and their apicallfreq is stored.

We’re using job clusters, and the problem is:

  • Cluster spins up
  • Runs the job
  • Terminates immediately
  • Next run starts → spins up again

So for 1-min jobs, it’s basically constantly restarting clusters, which is not really feasible (time + cost).

We looked into:

  • Continuous jobs → but that doesn’t really work for us because we need task dependencies + ForEach
  • Cron scheduling → same issue, cluster keeps terminating after each run

Has anyone handled something similar?

  • Did you move everything into a single notebook and manage scheduling inside?
  • Use an all-purpose cluster instead?
  • Or is there a better pattern for handling different API frequencies?

Would really appreciate any practical suggestions from real-world setups.

Thanks!