<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: REST API for Pipeline Events does not return all records in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/rest-api-for-pipeline-events-does-not-return-all-records/m-p/90554#M37956</link>
    <description>&lt;P&gt;You can leverage this code base. It works as expected using &lt;STRONG&gt;"next_page_token"&lt;/STRONG&gt; parameter-&lt;/P&gt;&lt;P&gt;Don't forget to mark this solution as correct if this helped you &lt;span class="lia-unicode-emoji" title=":upside_down_face:"&gt;🙃&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;import requests

token = 'your token'
url = 'your URL'

params = {'expand_tasks': 'true'}
header = {'Authorization': f'Bearer {token}'}

while True:

    response = requests.get(url, headers=header, params=params)
    response_data = response.json()
    jobs = response_data.get("jobs", [])

    for job in jobs:

        settings = job.get('settings')
        task = settings.get('tasks')

        if task and task[0].get('existing_cluster_id'):
            job_name = settings.get('name')
            job_creator = job.get('creator_user_name')
            print(f'job creator name= {job_creator} &amp;amp; job name= {job_name}')
        else:
            print(f"{settings.get('name')} not running on ACL")

    next_page_token = response_data.get('next_page_token')
    if not next_page_token:
        break  

    params['page_token'] = next_page_token&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 16 Sep 2024 11:09:15 GMT</pubDate>
    <dc:creator>wise_owl</dc:creator>
    <dc:date>2024-09-16T11:09:15Z</dc:date>
    <item>
      <title>REST API for Pipeline Events does not return all records</title>
      <link>https://community.databricks.com/t5/data-engineering/rest-api-for-pipeline-events-does-not-return-all-records/m-p/71114#M34248</link>
      <description>&lt;P&gt;I'm using the REST API to retrieve Pipeline Events per the documentation:&lt;BR /&gt;&lt;A href="https://docs.databricks.com/api/workspace/pipelines/listpipelineevents" target="_blank"&gt;https://docs.databricks.com/api/workspace/pipelines/listpipelineevents&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I am able to retrieve some records but the API stops after a call or two. &amp;nbsp;I verified the number of rows using the TVF "event_logs", which is over 300 records. &amp;nbsp;The API consistently returns 34-35 before stopping, furthermore, I used the Databricks SDK to attempt the same thing, however, the results are the same (34-35) records.&lt;/P&gt;&lt;P&gt;&lt;A href="https://databricks-sdk-py.readthedocs.io/en/latest/workspace/pipelines/pipelines.html" target="_blank"&gt;https://databricks-sdk-py.readthedocs.io/en/latest/workspace/pipelines/pipelines.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 30 May 2024 19:14:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rest-api-for-pipeline-events-does-not-return-all-records/m-p/71114#M34248</guid>
      <dc:creator>JUPin</dc:creator>
      <dc:date>2024-05-30T19:14:51Z</dc:date>
    </item>
    <item>
      <title>Re: REST API for Pipeline Events does not return all records</title>
      <link>https://community.databricks.com/t5/data-engineering/rest-api-for-pipeline-events-does-not-return-all-records/m-p/71510#M34329</link>
      <description>&lt;P&gt;Thanks for responding,&lt;/P&gt;&lt;P&gt;I've investigated your suggestions, here are my findings:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Check the max_results parameter&lt;/STRONG&gt;: Ensure that you’re not inadvertently limiting the number of results returned. The default value is 1000, but you can adjust it as needed. &amp;nbsp;-- I've adjusted this over several runs. &amp;nbsp;The results get very wonky when I have a hard set value, for example, if I put set "max_results=1000", I get an error message stating the maximum value can be only 250. &amp;nbsp;If I set it to 100 (for example), sometimes the "display()" statements stop working altogether. &amp;nbsp;I have to detach and reattach the compute cluster for it start working again. &amp;nbsp;If I set it from 10 to 25, the results consistently retrieve, 35 rows.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Inspect the filter criteria&lt;/STRONG&gt;: If you’re using any filters (such as level='INFO' or timestamp &amp;gt; 'TIMESTAMP'), review them to make sure they’re not unintentionally restricting the results. -- Yes I've tried the filters, this doesn't seem to make a difference. &amp;nbsp;As a suggestion, I would strongly encourage a filter on the "update_id".&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Pagination&lt;/STRONG&gt;: The API response includes pagination tokens (next_page_token and prev_page_token). Make sure you’re handling these tokens correctly to retrieve all available events. If you’re not using them, you might be getting only the first page of results. -- Yes, I use "next_page_token" in my subsequent API calls. &amp;nbsp;Depending on how I set my "max_results", for example "max_results=25", I get the original data pull, then I use the "next_page_token" to get the next set, which is 10. &amp;nbsp;The second set doesn't have a "next_page_token"&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Rate Limiting&lt;/STRONG&gt;: Check if there’s any rate limiting or throttling applied to your API requests. Some APIs limit the number of requests per minute or hour. -- I don't receive any rate limiting error. &amp;nbsp;The API continues to call until it receives no response, I can even do it manually, so I don't believe this is an issue&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Error Handling&lt;/STRONG&gt;: Inspect the response for any error messages or status codes. It’s possible that an error is occurring during the API call. -- I've checked all the error messages and status codes that return, I do not receive any errors.&lt;/P&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Currently, I'm trying to setup a very simple example for the API call issue and the SDK to upload.&lt;/P&gt;</description>
      <pubDate>Mon, 03 Jun 2024 18:12:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rest-api-for-pipeline-events-does-not-return-all-records/m-p/71510#M34329</guid>
      <dc:creator>JUPin</dc:creator>
      <dc:date>2024-06-03T18:12:00Z</dc:date>
    </item>
    <item>
      <title>Re: REST API for Pipeline Events does not return all records</title>
      <link>https://community.databricks.com/t5/data-engineering/rest-api-for-pipeline-events-does-not-return-all-records/m-p/71781#M34396</link>
      <description>&lt;P&gt;I've attached some screenshots of the API call. &amp;nbsp;It shows "59" records (Event Log API1.png) retrieved and a populated "next_page_token" however, when I pull the next set of data using the "next_page_token", the result set is empty(Event Log API2.png). &amp;nbsp;Meanwhile, the SQL result from "event_log()" shows over 322 records(SQL event_log results.png).&lt;/P&gt;</description>
      <pubDate>Wed, 05 Jun 2024 16:39:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rest-api-for-pipeline-events-does-not-return-all-records/m-p/71781#M34396</guid>
      <dc:creator>JUPin</dc:creator>
      <dc:date>2024-06-05T16:39:03Z</dc:date>
    </item>
    <item>
      <title>Re: REST API for Pipeline Events does not return all records</title>
      <link>https://community.databricks.com/t5/data-engineering/rest-api-for-pipeline-events-does-not-return-all-records/m-p/90554#M37956</link>
      <description>&lt;P&gt;You can leverage this code base. It works as expected using &lt;STRONG&gt;"next_page_token"&lt;/STRONG&gt; parameter-&lt;/P&gt;&lt;P&gt;Don't forget to mark this solution as correct if this helped you &lt;span class="lia-unicode-emoji" title=":upside_down_face:"&gt;🙃&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;import requests

token = 'your token'
url = 'your URL'

params = {'expand_tasks': 'true'}
header = {'Authorization': f'Bearer {token}'}

while True:

    response = requests.get(url, headers=header, params=params)
    response_data = response.json()
    jobs = response_data.get("jobs", [])

    for job in jobs:

        settings = job.get('settings')
        task = settings.get('tasks')

        if task and task[0].get('existing_cluster_id'):
            job_name = settings.get('name')
            job_creator = job.get('creator_user_name')
            print(f'job creator name= {job_creator} &amp;amp; job name= {job_name}')
        else:
            print(f"{settings.get('name')} not running on ACL")

    next_page_token = response_data.get('next_page_token')
    if not next_page_token:
        break  

    params['page_token'] = next_page_token&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 16 Sep 2024 11:09:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rest-api-for-pipeline-events-does-not-return-all-records/m-p/90554#M37956</guid>
      <dc:creator>wise_owl</dc:creator>
      <dc:date>2024-09-16T11:09:15Z</dc:date>
    </item>
  </channel>
</rss>

