Hi All,
I have a job continuously processing IoT data. The workflow reads data from Azure Event Hub and inserts it into the Databricks bronze layer. From there, the data is read, processed, validated, and inserted into the Databricks silver layer. The job uses a job cluster with a continuous trigger type.
My job is failing once in a month with the error message “Cluster xxxx-221053-xxxxxxxx became unusable during the run since the driver became unhealthy.”
The support team has suggested implementing a frequent (weekly) restart of the streaming job to prevent such issues. To enable automatic restarts, I would need to create a time-triggered job that restarts the continuous job weekly using Databricks APIs.
Is there any alternative solution that allows me to process live streaming data without requiring periodic restarts?