Overwatch is an observability tool which helps you to monitor spending on your clouds and track usage in various dimensions. It works by collecting job and audit log data, then joining it with data from the Databricks REST API and other sources available in the platform. This data is processed into a set of tables that describe the ongoing activity of your Databricks Workspace(s).
Overwatch is maintained as part of Databricks Labs and supports all the major clouds: Azure, AWS, and GCP. In this post we will look at a variety of analytics made possible by Overwatch, then discuss what a multi-Workspace deployment is and how to implement it!
Monitoring Workspaces
2. Cluster count by type
3. DBU cost vs compute cost
Monitoring Clusters
2. DBU spend by cluster type
3. Cluster node types
4.Percentage of auto-scaling clusters
5. Scale up time of clusters without pools
6.Cluster failure state and count of failures
Monitoring Jobs
2. Jobs running in Interactive clusters
3.Daily job status distribution
4.Impact of failure by Workspace
Here are some other analyses you can perform with Overwatch:
If you possess multiple Databricks workspaces and wish to oversee them collectively, you can implement a multi-Workspace deployment. If the prospect of monitoring jobs across each of your 100 workspaces seems daunting, the solution is at hand. Through a multi-Workspace deployment, a single job in one Workspace can aggregate data from all specified Workspaces and seamlessly incorporate it into a centralized database in your Lakehouse. This enables you to query the data from any workspace of your choosing, streamlining the monitoring process.
Overwatch can be deployed on a single, primary Workspace and then retrieve data from all other Databricks Workspaces. For more details on requirements see Multi-Workspace Consideration. There are many cases where some Workspaces should be able to monitor many Workspaces and others should only monitor themselves. Additionally, co-location of the output data and who should be able to access what data also comes into play, this reference architecture can accommodate all of these needs. To learn more about the details walk through the deployment steps in the official Overwatch documentation.
For more details and instructions, please visit the official site for Overwatch. You can directly raise an issue in this link.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.