12-18-2024 12:12 PM
Hi All,
I have a job continuously processing IoT data. The workflow reads data from Azure Event Hub and inserts it into the Databricks bronze layer. From there, the data is read, processed, validated, and inserted into the Databricks silver layer. The job uses a job cluster with a continuous trigger type.
My job is failing once in a month with the error message “Cluster xxxx-221053-xxxxxxxx became unusable during the run since the driver became unhealthy.”
The support team has suggested implementing a frequent (weekly) restart of the streaming job to prevent such issues. To enable automatic restarts, I would need to create a time-triggered job that restarts the continuous job weekly using Databricks APIs.
Is there any alternative solution that allows me to process live streaming data without requiring periodic restarts?
12-18-2024 12:24 PM
Hi @Pramod_G,
To better suggest, we would need to understand the reason why the driver became unhealthy first. You can DIM cluster and region details I can try to check. From Spark Metrics, are there any resources issues?
12-18-2024 02:42 PM
sent you cluster details and region- DIM.
12-18-2024 12:26 PM
How are you ingesting the data? Are you using the Delta Live Table mechanism - https://docs.databricks.com/en/delta-live-tables/index.html?
12-18-2024 02:44 PM
yes, I am using spark session, readstream and writestream on delta table.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now