07-03-2025 01:32 AM
Hello,
For one of my clients, we are using an all-purpose cluster to run some Databricks notebooks. We noticed in the logs of the cluster some Azure Quota Exceptions, from which we would like to know more.
As you can see attached, the cluster always succeeds to reach its maximum nodes capacity (8 in this case), but increasing slowly from a number of nodes to another (which is understandable due to autoscaling). What we would like to explain is the "error" message almost every time it tries to upsize the number of nodes: "Compute upsize complete, but below the target size. Operation could not be completed as it results in exceeding approved LowPriorityCores quota".
Note that this does not cause the job to fail, but I would still like to investigate and understand why these "failures" when upsizing.
Also, we are using the Azure cloud for Databricks, and we do not have quota issues as you can also see on the second image attached.
Thanks a lot for the help,
Sacha
07-04-2025 02:15 AM
Hi @szymon_dybczak ,
Thank you for your help. I have followed your suggested steps and have the following output:
Should this increased then ? It seems like it's not being used though...
Sacha
07-04-2025 02:20 AM
Hi @sachamourier ,
Yes, I believe you need to increase that count. Let's have a look at previous screenshot you've shared with us. I marked your current limit with red circle. It matches limit that you have on your regional spot vCPUs.
07-03-2025 02:29 AM
hi @sachamourier ,
I see that you have quotas available for Standard DDSv5 Family vCPUs. Is your cluster using this exact node type?
The QuotaExceeded error typically indicates that your request for additional resources for a specific VM size exceeds the currently allowed limits for your Azure subscription. This includes not only the LowPriorityCores but can also apply to other core quotas for allocated regions.
I would request you to also reach out to Azure VM support for further clarification on this issue.
07-04-2025 12:50 AM - edited 07-04-2025 12:53 AM
Hi @Vidhi_Khaitan ,
Thank you for your response.
Attached is the type of cluster we are using. What quotas should we increase within the Azure Portal for this type of cluster family ? Also, within the Azure Portal Quotas page, no usage is shown for our type of cluster...
If needed, I will indeed reach Azure VM Support for further clarification.
Thank you,
Sacha
07-04-2025 02:02 AM
Hi @sachamourier ,
I noticed that you're using Spot instances in your compute. You need to increase vCPU quotas for Spot instances then. Check below article on MS docs:
Increase spot vCPU family quotas - Azure Quotas | Microsoft Learn
Basically, you need to do following:
1. To view the Quotas page, sign in to the Azure portal and enter "quotas" into the search box, then select Quotas.
2. On the Overview page, select Compute.
3. On the My quotas page, enter "spot" in the Search box.
4. Filter for any other requirements, such as Usage, as needed.
5. Find the quota or quotas you want to increase, and select them.
6. Near the top of the page, select New Quota Request, then select the way you'd like to increase the
quota(s): Enter a new limit or Adjust the usage %.
7. If you selected Enter a new limit: In the New Quota Request pane, enter a numerical value for each new quota limit.
8. If you selected Adjust the usage %: In the New Quota Request pane, adjust the slider to a new usage percent. Adjusting the percentage automatically calculates the new limit for each quota to be increased. This option is particularly useful when the selected quotas have very high usage.
9.When you're finished, select Submit.
07-04-2025 02:15 AM
Hi @szymon_dybczak ,
Thank you for your help. I have followed your suggested steps and have the following output:
Should this increased then ? It seems like it's not being used though...
Sacha
07-04-2025 02:20 AM
Hi @sachamourier ,
Yes, I believe you need to increase that count. Let's have a look at previous screenshot you've shared with us. I marked your current limit with red circle. It matches limit that you have on your regional spot vCPUs.
07-04-2025 02:28 AM
@szymon_dybczak That makes sense thank you !
However, do you have any idea of why it shows "Current usage: 0" ?
And also why does the cluster still manage to reach the max number of nodes even with these "failures" ?
Sacha
07-04-2025 02:56 AM
Hi,
Regarding current usage - it only means that at that exact moment, no Spot vCPUs are actively provisioned in your subscription/region.
Regarding second question, maybe @Vidhi_Khaitan will be able to answer.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now