Was it a one time only error or a recurring one?
For the former, I'd check if your vCPU quota was not exceeded, or perhaps there was a temporary issue with the cloud provider,... Could be a lot of things (lots of moving parts under the hood).
For the latter: we will have to figure out where the problem is located. code, cluster config, job timing,...
Excluding causes as much as possible.