cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Displaying job-run progress when submitting jobs via databricks-sdk

jmeidam
New Contributor

When I run notebooks from within a notebook using `dbutils.notebook.run`, I see a nice progress table that updates automatically, showing the execution time, the status, links to the notebook and it is seamless.

My goal now is to execute many notebooks or Python scripts in parallel and have Spark figure out when to run what and what resources to use. Essentially, I want to create a job with a dynamic number of parallel tasks and submit that. I can do that using the submit method of the WorkspaceClient jobs API. That works exactly as intended, however, the results are just a linear collection of prints to the standard output. I want a nice progress table such as the one I get when using `dbutils.notebook.run`. This is very convenient for navigating to failed notebooks/scripts and having a clear overview of all the task results in that job-run.

Do any of you know how this progress table is generated, and can I reproduce that in my own processes and loops?

This is the table I want to see:
progress tableprogress table

2 REPLIES 2

mark_ott
Databricks Employee
Databricks Employee

Direct UI Replication Is Not Supported Natively:
Databricks does not currently publish a public API or widget to embed the same progress table in notebooks for arbitrary parallel tasks, scripts, or jobs launched via WorkspaceClient Jobs API .

Workarounds and Alternatives

1. Use Job Task API + Output Tracking

  • You can collect status, results, and links via the Jobs API: Collect each task’s run ID, status, and notebook/script path, then use Python to poll for status updates.

  • Display this as a custom Markdown or HTML table in your notebook, but links won't be "magic"—they require manual formatting.

  • Example skeleton:

    python
    # For illustration for task in job_tasks: print(f"| {task['name']} | {task['status']} | {task['runtime']} | [Link]({get_databricks_url(task['run_id'])}) |")

    This is manual: you need to loop through run info, fetch status via API, and build the table .

2. Databricks Widgets with Polling

  • Use notebook widgets (dbutils.widgets) inside a master tracking notebook. Have each child task update its widget status, which the master notebook then polls and formats for display .

  • You must program status updates and presentation logic manually.

3. Job Results Table Notebook

  • After job completion, create a summary notebook to call the Jobs API, collect statuses, durations, and notebook/script links, and format them as a Markdown/HTML table.

  • Provide clickable links to failed/completed runs using run IDs.

Example: Pseudo-Progress Table

Task Name Status Time (s) Link
Notebook A Success 23 Open
Script B Failed 58 Open
 
 

You build this by querying job run info (via Jobs API), then render it in Markdown or HTML so it can display in a notebook. For dynamic updates, you'd need to use refreshable widgets or periodic polling .

Summary

  • The progress table feature in Databricks notebooks is not publicly available as an API or widget for custom jobs or scripts .

  • You can build similar tables manually by polling job/task status from the Jobs API and formatting it in Markdown/HTML within your notebook .

  • Direct live update and interactive linking works cleanly only with dbutils.notebook.run inside Databricks UI; custom approaches require more engineering effort and will have some limitations.

If your main goal is navigation and monitoring convenience, focus on collecting run IDs and statuses, poll the API, and format a summary table in your master notebook with links to the relevant outputs .

Coffee77
Contributor III

All good in @mark_ott response. As a potential improvement, instead of using polling, I think it would be better to publish events to a Bus (i.e. Azure Event Hub) from notebooks so that consumers could launch queries when receiving, processing and filtering events. 

Functionality of publishing events can be implemented in a shared library. I could provide code if needed.


Lifelong Solution Architect Learner | Coffee & Data