Databricks Community

Pramod_G · 3 hours ago

Hi All,

I have a job continuously processing IoT data. The workflow reads data from Azure Event Hub and inserts it into the Databricks bronze layer. From there, the data is read, processed, validated, and inserted into the Databricks silver layer. The job uses a job cluster with a continuous trigger type.

My job is failing once in a month with the error message “Cluster xxxx-221053-xxxxxxxx became unusable during the run since the driver became unhealthy.”

The support team has suggested implementing a frequent (weekly) restart of the streaming job to prevent such issues. To enable automatic restarts, I would need to create a time-triggered job that restarts the continuous job weekly using Databricks APIs.

Is there any alternative solution that allows me to process live streaming data without requiring periodic restarts?

Alberto_Umana · 3 hours ago

Hi @Pramod_G,

To better suggest, we would need to understand the reason why the driver became unhealthy first. You can DIM cluster and region details I can try to check. From Spark Metrics, are there any resources issues?