Databricks Jobs API - Throttling

noorbasha534 — Tue, 11 Mar 2025 23:25:15 GMT

Dear all,

I am planning to execute a script that fetches databricks jobs status every 10 minutes. I have around 500 jobs in my workspace. The APIs I use are listed below - list runs, get all job runs.

I was wondering if this could cause throttling as there are rate limits on jobs apis. I would like to know if there are better ways to handle this use case, apart from introducing logic to handle throttling.

On a side note, if throttling occurs, will the other important jobs in the workspace fail (say, fail to launch)? or they will be just retried once throttling disappears.

# Function to get all job runs within the date range

def get_all_job_runs(start_time, end_time)

all_runs = []

has_more = True

offset = 0

limit = 25 # Adjust the limit as needed

while has_more:

job_runs = db.jobs.list_runs(

active_only=False,

start_time_from=start_time,

start_time_to=end_time,

offset=offset,

limit=limit

)

all_runs.extend(job_runs['runs'])

has_more = job_runs.get('has_more', False)

offset += limit

return all_runs

# Get all job runs for the given date range

job_runs = get_all_job_runs(start_time, end_time)

Re: Databricks Jobs API - Throttling

koji_kawamura — Mon, 17 Mar 2025 08:17:46 GMT

Hi @noorbasha534

Different limitations are implemented at API endpoints. The "/jobs/runs/list" has a limitation of 30 requests/second. The number of concurrent task executions is limited up to 2000. These limits work separately, so the job list API rate limit can return 429 response, but it should not block the execution of a new job.

https://docs.databricks.com/aws/en/resources/limits#api-rate-limits

If you have about 500 jobs, your script can call the API endpoint about 20 times per second. Which is lower than the limit, but if you have more jobs in the future, it may encounter the limit.

Alternatively, depending on your requirements, system tables may be helpful. For example, you can query more job runs at once by the following SQL statement:

SELECT * FROM job_run_timeline
WHERE workspace_id = "<workspace-id>"
AND period_start_time >= "2025-03-15T09:00:00"
AND period_end_time <= "2025-03-15T10:00:00"

topic Re: Databricks Jobs API - Throttling in Data Engineering

Databricks Jobs API - Throttling

Re: Databricks Jobs API - Throttling