Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Showing results for 
Search instead for 
Did you mean: 

What are the metrics to be considered for monitoring the Databricks

New Contributor

I am very new to Databricks, just setting up with things. I would like to explore various features of Databricks and start playing around with the environment.

I am curious to know what are the metrics should be considered for monitoring the complete Databricks setup (cluster level, app level, service level, etc.,)

Could someone please help me on this


Community Manager
Community Manager

Hi @Archana Devi K​, This tutorial walks you through using the Databricks Data Science and Engineering workspace to create a cluster and a notebook, create a table from a dataset, query the table, and display the results.

Honored Contributor

Hi @Archana Devi K​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.


New Contributor II

Databricks is a powerful platform for data engineering, machine learning, and analytics, and it is important to monitor the performance and health of your Databricks environment to ensure that it is running smoothly.

Here are a few key metrics that you should consider monitoring in your Databricks environment: DQFanSurvey

  1. Cluster CPU and Memory Utilization: These metrics will give you an idea of how your clusters are performing and if they are being utilized efficiently.
  2. Job and Task Metrics: These metrics include job and task completion times, as well as the number of jobs and tasks running concurrently.
  3. Network Traffic: Monitoring network traffic will give you an idea of how data is flowing through your Databricks environment.
  4. Storage: Monitor the storage usage of the Databricks environment and make sure that the storage space is sufficient for data and logs.
  5. Errors and Logs: Monitor the errors and logs for troubleshooting and debugging purposes.
  6. Data Latency: Monitor the time it takes for data to be written to and read from storage.
  7. Cluster Auto-Scaling: Monitor the auto-scaling of the clusters to make sure that they are scaling up and down as needed.
  8. Security: Monitor the security of the environment by monitoring the authentication and authorization activity.

It's also important to monitor the performance of the underlying infrastructure, like the disk I/O and CPU usage of the machines.

These are just a few examples of metrics that you may want to consider monitoring. The specific metrics that you will need to monitor will depend on your use case and the requirements of your Databricks environment.

Databricks has a built-in monitoring system that allows you to track and analyze these metrics and more. You can also set up alerts and dashboards to monitor critical metrics in real-time.

You can also use third-party monitoring tools like Prometheus, Grafana, or Datadog to monitor your Databricks environment.

It's important to test and monitor your setup regularly to make sure that it is performing as expected and to detect any potential issues early.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!