Observability and monitoring accross multiple workspaces(both job clusters and serverless compute)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
14 hours ago
Hi all,
Today what are the best option available today for observability and monitoring databricks jobs accross all workspaces. We have 100 of workspaces and it hard to do monitoring to check failed and successeded jobs.
We tried using:
1. Team webhook to notify ourselves if there are any errors but its not very scalable
2. Grafana and Datadog but they are limited with init script which is no more the option on serverless compute.
3. System tables(compute and job timeline) but they lack the capability of showing resource usage metrics.
4. Databricks Workflow UI : its limited to one workspace so not scalable.
What we want to have:
1. Overview of Jobs failed or success across all workspaces
2. Get failure alerts and easy to navigate to application logs.
3. Good to have email alerts.
4. Its supports serverless compute.
Thanks in advance!
Best REgards,
sunny

