cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
cancel
Showing results for 
Search instead for 
Did you mean: 

Cluster Monitoring

kunaldeb
New Contributor III

Hi Databricks Community,

I have some real-life use-case that I like to achieve as soon possible and that is the reason, I am reaching to you for implementation guidelines/Ideas/Documentations/Best Practices.

Assume I am an IT manager and in my production Databricks environment I have many multi-purpose clusters as well as job clusters. As a Databricks production environment manager I like to monitor its usage, status, errors from a dashboard and email notification with as easy as possible way. Dashboard that is filled with all Key information for utilization status, quick fault finding and cost reduction, etc.

I hope I am not asking anything unpractical. Please give your inputs if possible.

Major points that I like to cover:

1> Is my Databricks clusters are under-utilized or over utilized?

2> If my Databricks clusters are over utilized, which process or what set of queries or what particular user or what time frame resource consumption is high?

2.a> Any particular set of queries creating any issue?

3> Assume If one of my clusters has ‘1’ as min node and ‘20’ as max node, then how much time node utilization is staying above 70% (or any other %) or utilization trend-lines?

4> Notification like cluster restart or terminated or one particular job failed consecutive x-number of times, etc.

5> Any such thing that should be monitored or controlled with immediate effect.

1 ACCEPTED SOLUTION

Accepted Solutions

karthik_p
Esteemed Contributor

@kunal debnath​ most of things you can monitor in cluster metric tab, go to any of your cluster and you can see in cluster metric tab (DBR 13.0), for DBR <13.0 it will be Ganglia UI.

you can configure 3rd party monitoring tools like DataDog also

View solution in original post

4 REPLIES 4

karthik_p
Esteemed Contributor

@kunal debnath​ most of things you can monitor in cluster metric tab, go to any of your cluster and you can see in cluster metric tab (DBR 13.0), for DBR <13.0 it will be Ganglia UI.

you can configure 3rd party monitoring tools like DataDog also

kunaldeb
New Contributor III

Thanks for your comments.

Anonymous
Not applicable

Hi @kunal debnath​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

kunaldeb
New Contributor III

Hi Vidula,

Please share if you have more inputs. I like to hear more inputs about customized monitoring and best practice from industry.

Thanks you.