โ05-30-2024 12:14 PM
I'm using the REST API to retrieve Pipeline Events per the documentation:
https://docs.databricks.com/api/workspace/pipelines/listpipelineevents
I am able to retrieve some records but the API stops after a call or two. I verified the number of rows using the TVF "event_logs", which is over 300 records. The API consistently returns 34-35 before stopping, furthermore, I used the Databricks SDK to attempt the same thing, however, the results are the same (34-35) records.
https://databricks-sdk-py.readthedocs.io/en/latest/workspace/pipelines/pipelines.html
โ05-31-2024 01:43 AM
Hi @JUPin, It seems youโre encountering an issue with the Databricks REST API for retrieving pipeline events.
Letโs explore this further and see if we can identify the cause.
First, letโs review the relevant information from the Databricks REST API reference:
GET /api/2.0/pipelines/{pipeline_id}/events
.max_results
(maximum number of entries to return), order_by
(sort order by timestamp), and filter
(criteria to select a subset of results).max_results
events in a response, even if there are more events available.Given that youโre only receiving 34-35 records consistently, here are some troubleshooting steps you can take:
Check the max_results
parameter: Ensure that youโre not inadvertently limiting the number of results returned. The default value is 1000, but you can adjust it as needed.
Inspect the filter
criteria: If youโre using any filters (such as level='INFO'
or timestamp > 'TIMESTAMP'
), review them to make sure theyโre not unintentionally restricting the results.
Pagination: The API response includes pagination tokens (next_page_token
and prev_page_token
). Make sure youโre handling these tokens correctly to retrieve all available events. If youโre not using them, you might be getting only the first page of results.
Rate Limiting: Check if thereโs any rate limiting or throttling applied to your API requests. Some APIs limit the number of requests per minute or hour.
Error Handling: Inspect the response for any error messages or status codes. Itโs possible that an error is occurring during the API call.
Regarding the Databricks SDK, you mentioned that you encountered the same issue. Make sure youโre using the correct SDK method to retrieve pipeline events. You can refer to the Databricks SDK documentation for details on how to use the list_pipeline_events
function.
I hope this helps you troubleshoot the issue! Let me know if you need further assistance or have additional details to share. ๐
โ06-03-2024 11:12 AM
Thanks for responding,
I've investigated your suggestions, here are my findings:
Check the max_results parameter: Ensure that youโre not inadvertently limiting the number of results returned. The default value is 1000, but you can adjust it as needed. -- I've adjusted this over several runs. The results get very wonky when I have a hard set value, for example, if I put set "max_results=1000", I get an error message stating the maximum value can be only 250. If I set it to 100 (for example), sometimes the "display()" statements stop working altogether. I have to detach and reattach the compute cluster for it start working again. If I set it from 10 to 25, the results consistently retrieve, 35 rows.
Inspect the filter criteria: If youโre using any filters (such as level='INFO' or timestamp > 'TIMESTAMP'), review them to make sure theyโre not unintentionally restricting the results. -- Yes I've tried the filters, this doesn't seem to make a difference. As a suggestion, I would strongly encourage a filter on the "update_id".
Pagination: The API response includes pagination tokens (next_page_token and prev_page_token). Make sure youโre handling these tokens correctly to retrieve all available events. If youโre not using them, you might be getting only the first page of results. -- Yes, I use "next_page_token" in my subsequent API calls. Depending on how I set my "max_results", for example "max_results=25", I get the original data pull, then I use the "next_page_token" to get the next set, which is 10. The second set doesn't have a "next_page_token"
Rate Limiting: Check if thereโs any rate limiting or throttling applied to your API requests. Some APIs limit the number of requests per minute or hour. -- I don't receive any rate limiting error. The API continues to call until it receives no response, I can even do it manually, so I don't believe this is an issue
Error Handling: Inspect the response for any error messages or status codes. Itโs possible that an error is occurring during the API call. -- I've checked all the error messages and status codes that return, I do not receive any errors.
Currently, I'm trying to setup a very simple example for the API call issue and the SDK to upload.
โ06-05-2024 09:39 AM
I've attached some screenshots of the API call. It shows "59" records (Event Log API1.png) retrieved and a populated "next_page_token" however, when I pull the next set of data using the "next_page_token", the result set is empty(Event Log API2.png). Meanwhile, the SQL result from "event_log()" shows over 322 records(SQL event_log results.png).
2 weeks ago
You can leverage this code base. It works as expected using "next_page_token" parameter-
Don't forget to mark this solution as correct if this helped you ๐
import requests
token = 'your token'
url = 'your URL'
params = {'expand_tasks': 'true'}
header = {'Authorization': f'Bearer {token}'}
while True:
response = requests.get(url, headers=header, params=params)
response_data = response.json()
jobs = response_data.get("jobs", [])
for job in jobs:
settings = job.get('settings')
task = settings.get('tasks')
if task and task[0].get('existing_cluster_id'):
job_name = settings.get('name')
job_creator = job.get('creator_user_name')
print(f'job creator name= {job_creator} & job name= {job_name}')
else:
print(f"{settings.get('name')} not running on ACL")
next_page_token = response_data.get('next_page_token')
if not next_page_token:
break
params['page_token'] = next_page_token
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group