cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

What are the metrics to be considered for monitoring the Databricks

Archana
New Contributor

I am very new to Databricks, just setting up with things. I would like to explore various features of Databricks and start playing around with the environment.

I am curious to know what are the metrics should be considered for monitoring the complete Databricks setup (cluster level, app level, service level, etc.,)

Could someone please help me on this

2 REPLIES 2

Vidula
Honored Contributor

Hi @Archana Devi Kโ€‹ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

jessykoo32
New Contributor II

Databricks is a powerful platform for data engineering, machine learning, and analytics, and it is important to monitor the performance and health of your Databricks environment to ensure that it is running smoothly.

Here are a few key metrics that you should consider monitoring in your Databricks environment: DQFanSurvey

  1. Cluster CPU and Memory Utilization: These metrics will give you an idea of how your clusters are performing and if they are being utilized efficiently.
  2. Job and Task Metrics: These metrics include job and task completion times, as well as the number of jobs and tasks running concurrently.
  3. Network Traffic: Monitoring network traffic will give you an idea of how data is flowing through your Databricks environment.
  4. Storage: Monitor the storage usage of the Databricks environment and make sure that the storage space is sufficient for data and logs.
  5. Errors and Logs: Monitor the errors and logs for troubleshooting and debugging purposes.
  6. Data Latency: Monitor the time it takes for data to be written to and read from storage.
  7. Cluster Auto-Scaling: Monitor the auto-scaling of the clusters to make sure that they are scaling up and down as needed.
  8. Security: Monitor the security of the environment by monitoring the authentication and authorization activity.

It's also important to monitor the performance of the underlying infrastructure, like the disk I/O and CPU usage of the machines.

These are just a few examples of metrics that you may want to consider monitoring. The specific metrics that you will need to monitor will depend on your use case and the requirements of your Databricks environment.

Databricks has a built-in monitoring system that allows you to track and analyze these metrics and more. You can also set up alerts and dashboards to monitor critical metrics in real-time.

You can also use third-party monitoring tools like Prometheus, Grafana, or Datadog to monitor your Databricks environment.

It's important to test and monitor your setup regularly to make sure that it is performing as expected and to detect any potential issues early.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group