<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Databricks Jobs API - Throttling in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/databricks-jobs-api-throttling/m-p/112773#M44322</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/124839"&gt;@noorbasha534&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Different limitations are implemented at API endpoints. The "/jobs/runs/list" has a limitation of 30 requests/second. The number of concurrent task executions is limited up to&amp;nbsp;&lt;SPAN&gt;2000. These limits work separately, so the job list API rate limit can return 429 response, but it should not block the execution of a new job.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://docs.databricks.com/aws/en/resources/limits#api-rate-limits" target="_blank"&gt;https://docs.databricks.com/aws/en/resources/limits#api-rate-limits&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;If you have about 500 jobs, your script can call the API endpoint about 20 times per second. Which is lower than the limit, but if you have more jobs in the future, it may encounter the limit.&lt;/P&gt;
&lt;P&gt;Alternatively, depending on your requirements, &lt;A href="https://docs.databricks.com/aws/en/admin/system-tables/jobs" target="_blank"&gt;system tables&lt;/A&gt; may be helpful. For example, you can query more job runs at once by the following SQL statement:&lt;/P&gt;
&lt;DIV&gt;
&lt;PRE&gt;&lt;SPAN&gt;SELECT&lt;/SPAN&gt; &lt;SPAN&gt;*&lt;/SPAN&gt; &lt;SPAN&gt;FROM&lt;/SPAN&gt;&lt;SPAN&gt; job_run_timeline&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;WHERE&lt;/SPAN&gt;&lt;SPAN&gt; workspace_id &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;"&amp;lt;workspace-id&amp;gt;"&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;AND&lt;/SPAN&gt;&lt;SPAN&gt; period_start_time &lt;/SPAN&gt;&lt;SPAN&gt;&amp;gt;=&lt;/SPAN&gt; &lt;SPAN&gt;"2025-03-15T09:00:00"&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;AND&lt;/SPAN&gt;&lt;SPAN&gt; period_end_time &lt;/SPAN&gt;&lt;SPAN&gt;&amp;lt;=&lt;/SPAN&gt; &lt;SPAN&gt;"2025-03-15T10:00:00"&lt;BR /&gt;&lt;/SPAN&gt;&lt;/PRE&gt;
&lt;/DIV&gt;</description>
    <pubDate>Mon, 17 Mar 2025 08:17:46 GMT</pubDate>
    <dc:creator>koji_kawamura</dc:creator>
    <dc:date>2025-03-17T08:17:46Z</dc:date>
    <item>
      <title>Databricks Jobs API - Throttling</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-jobs-api-throttling/m-p/112313#M44174</link>
      <description>&lt;P&gt;Dear all,&lt;/P&gt;&lt;P&gt;I am planning to execute a script that fetches databricks jobs status every 10 minutes. I have around 500 jobs in my workspace. The APIs I use are listed below - list runs, get all job runs.&lt;/P&gt;&lt;P&gt;I was wondering if this could cause throttling as there are rate limits on jobs apis. I would like to know if there are better ways to handle this use case, apart from introducing logic to handle throttling.&lt;/P&gt;&lt;P&gt;On a side note, if throttling occurs, will the other important jobs in the workspace fail (say, fail to launch)? or they will be just retried once throttling disappears.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;# Function to get all job runs within the date range&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;def&lt;/SPAN&gt; &lt;SPAN&gt;get_all_job_runs&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;start_time&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;end_time)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; all_runs &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; []&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; has_more &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;True&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; offset &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;0&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; limit &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;25&lt;/SPAN&gt;&lt;SPAN&gt; &amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;# Adjust the limit as needed&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;while&lt;/SPAN&gt;&lt;SPAN&gt; has_more:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; job_runs &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; db.jobs.&lt;/SPAN&gt;&lt;SPAN&gt;list_runs&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;active_only&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;False&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;start_time_from&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;start_time,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;start_time_to&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;end_time,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;offset&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;offset,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;limit&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;limit&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; )&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; all_runs.&lt;/SPAN&gt;&lt;SPAN&gt;extend&lt;/SPAN&gt;&lt;SPAN&gt;(job_runs[&lt;/SPAN&gt;&lt;SPAN&gt;'runs'&lt;/SPAN&gt;&lt;SPAN&gt;])&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; has_more &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; job_runs.&lt;/SPAN&gt;&lt;SPAN&gt;get&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'has_more'&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;False&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; offset &lt;/SPAN&gt;&lt;SPAN&gt;+=&lt;/SPAN&gt;&lt;SPAN&gt; limit&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;return&lt;/SPAN&gt;&lt;SPAN&gt; all_runs&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;# Get all job runs for the given date range&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;job_runs &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;get_all_job_runs&lt;/SPAN&gt;&lt;SPAN&gt;(start_time, end_time)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 11 Mar 2025 23:25:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-jobs-api-throttling/m-p/112313#M44174</guid>
      <dc:creator>noorbasha534</dc:creator>
      <dc:date>2025-03-11T23:25:15Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Jobs API - Throttling</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-jobs-api-throttling/m-p/112773#M44322</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/124839"&gt;@noorbasha534&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Different limitations are implemented at API endpoints. The "/jobs/runs/list" has a limitation of 30 requests/second. The number of concurrent task executions is limited up to&amp;nbsp;&lt;SPAN&gt;2000. These limits work separately, so the job list API rate limit can return 429 response, but it should not block the execution of a new job.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://docs.databricks.com/aws/en/resources/limits#api-rate-limits" target="_blank"&gt;https://docs.databricks.com/aws/en/resources/limits#api-rate-limits&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;If you have about 500 jobs, your script can call the API endpoint about 20 times per second. Which is lower than the limit, but if you have more jobs in the future, it may encounter the limit.&lt;/P&gt;
&lt;P&gt;Alternatively, depending on your requirements, &lt;A href="https://docs.databricks.com/aws/en/admin/system-tables/jobs" target="_blank"&gt;system tables&lt;/A&gt; may be helpful. For example, you can query more job runs at once by the following SQL statement:&lt;/P&gt;
&lt;DIV&gt;
&lt;PRE&gt;&lt;SPAN&gt;SELECT&lt;/SPAN&gt; &lt;SPAN&gt;*&lt;/SPAN&gt; &lt;SPAN&gt;FROM&lt;/SPAN&gt;&lt;SPAN&gt; job_run_timeline&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;WHERE&lt;/SPAN&gt;&lt;SPAN&gt; workspace_id &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;"&amp;lt;workspace-id&amp;gt;"&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;AND&lt;/SPAN&gt;&lt;SPAN&gt; period_start_time &lt;/SPAN&gt;&lt;SPAN&gt;&amp;gt;=&lt;/SPAN&gt; &lt;SPAN&gt;"2025-03-15T09:00:00"&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;AND&lt;/SPAN&gt;&lt;SPAN&gt; period_end_time &lt;/SPAN&gt;&lt;SPAN&gt;&amp;lt;=&lt;/SPAN&gt; &lt;SPAN&gt;"2025-03-15T10:00:00"&lt;BR /&gt;&lt;/SPAN&gt;&lt;/PRE&gt;
&lt;/DIV&gt;</description>
      <pubDate>Mon, 17 Mar 2025 08:17:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-jobs-api-throttling/m-p/112773#M44322</guid>
      <dc:creator>koji_kawamura</dc:creator>
      <dc:date>2025-03-17T08:17:46Z</dc:date>
    </item>
  </channel>
</rss>

