05-06-2023 10:08 AM
Hi Databricks Community,
I have some real-life use-case that I like to achieve as soon possible and that is the reason, I am reaching to you for implementation guidelines/Ideas/Documentations/Best Practices.
Assume I am an IT manager and in my production Databricks environment I have many multi-purpose clusters as well as job clusters. As a Databricks production environment manager I like to monitor its usage, status, errors from a dashboard and email notification with as easy as possible way. Dashboard that is filled with all Key information for utilization status, quick fault finding and cost reduction, etc.
I hope I am not asking anything unpractical. Please give your inputs if possible.
Major points that I like to cover:
1> Is my Databricks clusters are under-utilized or over utilized?
2> If my Databricks clusters are over utilized, which process or what set of queries or what particular user or what time frame resource consumption is high?
2.a> Any particular set of queries creating any issue?
3> Assume If one of my clusters has ‘1’ as min node and ‘20’ as max node, then how much time node utilization is staying above 70% (or any other %) or utilization trend-lines?
4> Notification like cluster restart or terminated or one particular job failed consecutive x-number of times, etc.
5> Any such thing that should be monitored or controlled with immediate effect.
05-07-2023 09:15 AM
@kunal debnath most of things you can monitor in cluster metric tab, go to any of your cluster and you can see in cluster metric tab (DBR 13.0), for DBR <13.0 it will be Ganglia UI.
you can configure 3rd party monitoring tools like DataDog also
05-07-2023 09:15 AM
@kunal debnath most of things you can monitor in cluster metric tab, go to any of your cluster and you can see in cluster metric tab (DBR 13.0), for DBR <13.0 it will be Ganglia UI.
you can configure 3rd party monitoring tools like DataDog also
05-16-2023 03:08 AM
Thanks for your comments.
05-19-2023 01:29 AM
Hi @kunal debnath
Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.
We'd love to hear from you.
Thanks!
05-21-2023 09:04 PM
Hi Vidula,
Please share if you have more inputs. I like to hear more inputs about customized monitoring and best practice from industry.
Thanks you.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group