Hi Databricks Community,
Weโre running Databricks on AWS and would like to improve operational incident management for production workloads.
Is there any official Databricks documentation or recommended approach to integrate with ServiceNow for automated incident/ticket creation (e.g., on job failure, cluster issues, etc.)?
For high-priority job failures, what are the best options to configure real-time notifications to a phone (SMS/voice/push)?
- Are there native capabilities in Databricks Workflows/Jobs for this, or is the recommended pattern to integrate with AWS services (SNS, EventBridge, PagerDuty/Opsgenie, etc.)?
Any guidance, reference architectures, or example implementations would be appreciated.
Thanks,
Narendra Vempala
@Naren.Samurai