Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-18-2026 02:02 PM
I don't think there is anything native for this in Databricks. The closest match would have been system tables (system.lakeflow.job_run_timeline / job_task_run_timeline) but I don't think it will have the necessary grain for what your pattern.
There's probably two different ways to try and think about it.
Approach 1:
- Enable Change Data Feed on your status Delta table: ALTER TABLE … SET TBLPROPERTIES (delta.enableChangeDataFeed = true).
- Create a Lakebase Postgres instance and a synced table in Continuous mode — it replicates the Delta table to Postgres with a minimum refresh interval of ~15 seconds.
- Your backend API queries Postgres directly with a plain SELECT COUNT(*) FILTER (WHERE status='COMPLETED'), COUNT(*) FROM tasks WHERE run_id = ? — millisecond latency, no Databricks SQL warehouse spin-up cost per request, and no progress logic in the processing job. Lakebase supports up to 1,000 concurrent connections, so you can poll from the frontend safely.
- Bonus: same Lakebase instance can back other operational lookups for your app.
Approach 2:
- Point your backend at a small serverless SQL warehouse and call the Statement Execution API (https://docs.databricks.com/api/workspace/statementexecution) with a parameterized aggregate query keyed by run_id.
- Cache results in your backend for a few seconds to avoid hammering the warehouse.
- Trade-off: serverless warehouse cold-starts and per-query latency are higher than Postgres; fine for a progress bar polled every 3–10s, less ideal if you need sub-second updates.
you can continue using your existing job-success/fail webhook for the terminal signal. Use approaches above only for the in-flight 200/500 progress updates. That avoids hammering anything when the run is idle.
~Mohan Mathews, Lead DSA.