Overwatch is an observability tool which helps you to monitor spending on your clouds and trackusagein various dimensions. It works by collecting job and audit log data, then joining it with data from the Databricks REST API and other sources available in the platform. This data is processed into a set of tables that describe the ongoing activity of your Databricks Workspace(s).
Overwatch is maintained as part of Databricks Labs and supports all the major clouds: Azure, AWS, and GCP. In this post we will look at a variety of analytics made possible by Overwatch, then discuss what a multi-Workspace deployment is and how to implement it!
Features of Overwatch
2. Cluster count by type
3. DBU cost vs compute cost
Most expensive clusters by day
2. DBU spend by cluster type
3. Cluster node types
4.Percentage of auto-scaling clusters
5. Scale up time of clusters without pools
6.Cluster failure state and count of failures
DBUs by Workflow by Workspace by date
2. Jobs running in Interactive clusters
3.Daily job status distribution
4.Impact of failure by Workspace
Here are some other analyses you can perform with Overwatch:
Last 30 days spend Aggregate cost of cluster spend in all workspaces for the last 30 days.
Month-over-month change in spend Percentage change of cluster spend compared with previous month. For example, if the percentages drop below zero, it signifies that usage is down from the previous month, and vice versa.
Top 3 cluster spend by workspace in the last 30 days Provides information on the top three clusters that spend the most, per Workspace.
Week-over-week top 10 fastest growing clusters by Workspace Top 10 clusters with fastest growth in spend compared with previous week. For example, if the percentages drop below zero, it signifies dip in growth previous week, and vice versa.
Last 7 days of spend by Databricks Workflow Expenses for each job in the last 7 days.
Last 7 days of spend for Databricks Workflows executed on interactive clusters Expenses for jobs performed on interactive clustersin the previous 7 days
What is a multi-Workspace deployment of Overwatch?
If you possess multiple Databricks workspaces and wish to oversee them collectively, you can implement a multi-Workspace deployment. If the prospect of monitoring jobs across each of your 100 workspaces seems daunting, the solution is at hand. Through a multi-Workspace deployment, a single job in one Workspace can aggregate data from all specified Workspaces and seamlessly incorporate it into a centralized database in your Lakehouse. This enables you to query the data from any workspace of your choosing, streamlining the monitoring process.
Architecture of a multi-Workspace Deployment:
Overwatch can be deployed on a single, primary Workspace and then retrieve data from all other Databricks Workspaces. For more details on requirements seeMulti-Workspace Consideration. There are many cases where some Workspaces should be able to monitor many Workspaces and others should only monitor themselves. Additionally, co-location of the output data and who should be able to access what data also comes into play, this reference architecture can accommodate all of these needs. To learn more about the details walk through the deployment steps in the official Overwatch documentation.