cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
SriramMohanty
Databricks Employee
Databricks Employee

What is Overwatch?

overwatch.pngOverwatch is an observability tool which helps you to monitor spending on your clouds and track usage in various dimensions. It works by collecting job and audit log data, then joining it with data from the Databricks REST API and other sources available in the platform.  This data is processed into a set of tables that describe the ongoing activity of your Databricks Workspace(s). 

Overwatch is maintained as part of Databricks Labs and supports all the major clouds: Azure, AWS, and GCP. In this post we will look at a variety of analytics made possible by Overwatch, then discuss what a multi-Workspace deployment is and how to implement it!

 

Features of Overwatch

Monitoring Workspaces

  1. Total spend 

SriramMohanty_0-1701149209768.png

2. Cluster count by type 

SriramMohanty_2-1701149336228.png

3. DBU cost vs compute cost 

SriramMohanty_3-1701149813391.png

 

Monitoring Clusters

  1. Most expensive clusters by day

SriramMohanty_0-1701183538445.png

2. DBU spend by cluster type

SriramMohanty_1-1701183574382.png

3. Cluster node types

SriramMohanty_2-1701183606944.png

4.Percentage of auto-scaling clusters

SriramMohanty_3-1701183629800.png

5. Scale up time of clusters without pools

SriramMohanty_4-1701183781343.png

6.Cluster failure state and count of failures

SriramMohanty_5-1701183812941.png

Monitoring Jobs

  1. DBUs by Workflow by Workspace by date

SriramMohanty_6-1701184000914.png

2. Jobs running in Interactive clusters

SriramMohanty_7-1701184033554.png

3.Daily job status distribution

SriramMohanty_8-1701184057763.png

4.Impact of failure by Workspace

SriramMohanty_9-1701184097575.png

Here are some other analyses you can perform with Overwatch:

  1. Last 30 days spend 
    Aggregate cost of cluster spend in all workspaces for the last 30 days.

  2. Month-over-month change in spend
    Percentage change of cluster spend compared with previous month. For example, if the percentages drop below zero, it signifies that usage is down from the previous month, and vice versa.

  3. Top 3 cluster spend by workspace in the last 30 days
    Provides information on the top three clusters that spend the most, per Workspace.

  4. Week-over-week top 10 fastest growing clusters by Workspace
    Top 10 clusters with fastest growth in spend compared with previous week. For example, if the percentages drop below zero, it signifies dip in growth previous week, and vice versa.

  5. Last 7 days of spend by Databricks Workflow
    Expenses for each job in the last 7 days.

  6. Last 7 days of spend for Databricks Workflows executed on interactive clusters
    Expenses for jobs performed on interactive clustersin the previous 7 days

What is a multi-Workspace deployment of Overwatch?

If you possess multiple Databricks workspaces and wish to oversee them collectively, you can implement a multi-Workspace deployment. If the prospect of monitoring jobs across each of your 100 workspaces seems daunting, the solution is at hand. Through a multi-Workspace deployment, a single job in one Workspace can aggregate data from all specified Workspaces and seamlessly incorporate it into a centralized database in your Lakehouse. This enables you to query the data from any workspace of your choosing, streamlining the monitoring process.

Architecture of a multi-Workspace Deployment:

Overwatch can be deployed on a single, primary Workspace and then retrieve data from all other Databricks Workspaces. For more details on requirements see Multi-Workspace Consideration. There are many cases where some Workspaces should be able to monitor many Workspaces and others should only monitor themselves. Additionally, co-location of the output data and who should be able to access what data also comes into play, this reference architecture can accommodate all of these needs. To learn more about the details walk through the deployment steps in the official Overwatch documentation.

 

SriramMohanty_11-1701184716367.png

How to perform a multi-Workspace deployment

  1. Download the CSV file.
  2. Fill the CSV with the workspace details which you want to monitor. Please refer the column descriptions to know more about the columns.
  3. Add dependant library.
  4. Run it via Notebook (example here), or run it as a JAR.

For more details and instructions, please visit the official site for Overwatch. You can directly raise an issue in this link.