We use a "warmup" mechanism to get our DBR instance pool into a state where it has at-least-N instances. The logic is:
- For N repetitions:
- Request a new DBR cluster in the pool (which causes the pool to request an AWS instance)
- Wait for the cluster to report as RUNNING
- If it reports as TERMINATING, abandon this iteration
- Terminating the DBR cluster (to free it up for an upcoming real request)
Normally, this works fine. Lately, however, there's a weird issue: we hit the 1.1.1 situation ("If it reports as TERMINATING, abandon this iteration") for *all* clusters. I have occasionally seen this for 1 cluster (of, say, 40), but never for ALL of them.
What could cause DBR to transition a cluster to "TERMINATING" right after it's created?