Hi all,
I have a Databricks Workflow job in which the final task makes an external API call. Sometimes this API will be overloaded and the call will fail. In the spirit of automation, I want this task to retry the call an hour later if it fails in the hopes that the API will be freed up. However, my job cluster remains turned on for this entire hour, pointlessly eating up a ton of resources. Is there a way to shut the job cluster down during the job retry wait period and start back up once the wait period is over?
If not, what is the alternative? Use an all-purpose cluster with the lowest idle shutdown time? Although we'll be getting charged all-purpose compute, so I'm not sure the benefit would even outweigh the cost. Anything else helpful to note that I may be missing?