Job Cluster with Continuous Trigger Type: Is Frequent Restart Required?

Pramod_G
New Contributor II

Hi All,

I have a job continuously processing IoT data. The workflow reads data from Azure Event Hub and inserts it into the Databricks bronze layer. From there, the data is read, processed, validated, and inserted into the Databricks silver layer. The job uses a job cluster with a continuous trigger type.

My job is failing once in a month with the error message “Cluster xxxx-221053-xxxxxxxx became unusable during the run since the driver became unhealthy.

The support team has suggested implementing a frequent (weekly) restart of the streaming job to prevent such issues. To enable automatic restarts, I would need to create a time-triggered job that restarts the continuous job weekly using Databricks APIs.

Is there any alternative solution that allows me to process live streaming data without requiring periodic restarts?

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @Pramod_G,

To better suggest, we would need to understand the reason why the driver became unhealthy first. You can DIM cluster and region details I can try to check. From Spark Metrics, are there any resources issues?

sent you cluster details and region- DIM.

Walter_C
Databricks Employee
Databricks Employee

How are you ingesting the data? Are you using the Delta Live Table mechanism - https://docs.databricks.com/en/delta-live-tables/index.html?

Pramod_G
New Contributor II

yes, I am using spark session, readstream and writestream on delta table.