cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

What are the metrics to be considered for monitoring the Databricks

Archana
New Contributor

I am very new to Databricks, just setting up with things. I would like to explore various features of Databricks and start playing around with the environment.

I am curious to know what are the metrics should be considered for monitoring the complete Databricks setup (cluster level, app level, service level, etc.,)

Could someone please help me on this

3 REPLIES 3

Kaniz
Community Manager
Community Manager

Hi @Archana Devi K​, This tutorial walks you through using the Databricks Data Science and Engineering workspace to create a cluster and a notebook, create a table from a dataset, query the table, and display the results.

Vidula
Honored Contributor

Hi @Archana Devi K​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

jessykoo32
New Contributor II

Databricks is a powerful platform for data engineering, machine learning, and analytics, and it is important to monitor the performance and health of your Databricks environment to ensure that it is running smoothly.

Here are a few key metrics that you should consider monitoring in your Databricks environment: DQFanSurvey

  1. Cluster CPU and Memory Utilization: These metrics will give you an idea of how your clusters are performing and if they are being utilized efficiently.
  2. Job and Task Metrics: These metrics include job and task completion times, as well as the number of jobs and tasks running concurrently.
  3. Network Traffic: Monitoring network traffic will give you an idea of how data is flowing through your Databricks environment.
  4. Storage: Monitor the storage usage of the Databricks environment and make sure that the storage space is sufficient for data and logs.
  5. Errors and Logs: Monitor the errors and logs for troubleshooting and debugging purposes.
  6. Data Latency: Monitor the time it takes for data to be written to and read from storage.
  7. Cluster Auto-Scaling: Monitor the auto-scaling of the clusters to make sure that they are scaling up and down as needed.
  8. Security: Monitor the security of the environment by monitoring the authentication and authorization activity.

It's also important to monitor the performance of the underlying infrastructure, like the disk I/O and CPU usage of the machines.

These are just a few examples of metrics that you may want to consider monitoring. The specific metrics that you will need to monitor will depend on your use case and the requirements of your Databricks environment.

Databricks has a built-in monitoring system that allows you to track and analyze these metrics and more. You can also set up alerts and dashboards to monitor critical metrics in real-time.

You can also use third-party monitoring tools like Prometheus, Grafana, or Datadog to monitor your Databricks environment.

It's important to test and monitor your setup regularly to make sure that it is performing as expected and to detect any potential issues early.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.