cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

All-purpose cluster upsize failures

sachamourier
Contributor

Hello,

For one of my clients, we are using an all-purpose cluster to run some Databricks notebooks. We noticed in the logs of the cluster some Azure Quota Exceptions, from which we would like to know more.

As you can see attached, the cluster always succeeds to reach its maximum nodes capacity (8 in this case), but increasing slowly from a number of nodes to another (which is understandable due to autoscaling). What we would like to explain is the "error" message almost every time it tries to upsize the number of nodes: "Compute upsize complete, but below the target size. Operation could not be completed as it results in exceeding approved LowPriorityCores quota". 

Note that this does not cause the job to fail, but I would still like to investigate and understand why these "failures" when upsizing. 

Also, we are using the Azure cloud for Databricks, and we do not have quota issues as you can also see on the second image attached.

Thanks a lot for the help,

Sacha

2 ACCEPTED SOLUTIONS

Accepted Solutions

sachamourier
Contributor

Hi @szymon_dybczak ,

Thank you for your help. I have followed your suggested steps and have the following output:

sachamourier_0-1751620507575.png

Should this increased then ? It seems like it's not being used though...

Sacha

 



View solution in original post

szymon_dybczak
Esteemed Contributor III

Hi @sachamourier ,

Yes, I believe you need to increase that count. Let's have a look at previous screenshot you've shared with us. I marked your current limit with red circle. It matches limit that you have on your regional spot vCPUs.

szymon_dybczak_0-1751620767909.png

 

View solution in original post

7 REPLIES 7

Vidhi_Khaitan
Databricks Employee
Databricks Employee

hi @sachamourier ,

I see that you have quotas available for Standard DDSv5 Family vCPUs. Is your cluster using this exact node type?
The QuotaExceeded error typically indicates that your request for additional resources for a specific VM size exceeds the currently allowed limits for your Azure subscription. This includes not only the LowPriorityCores but can also apply to other core quotas for allocated regions.

I would request you to also reach out to Azure VM support for further clarification on this issue. 

Hi @Vidhi_Khaitan ,

Thank you for your response.

Attached is the type of cluster we are using. What quotas should we increase within the Azure Portal for this type of cluster family ? Also, within the Azure Portal Quotas page, no usage is shown for our type of cluster...

If needed, I will indeed reach Azure VM Support for further clarification.

Thank you,

Sacha

 

szymon_dybczak
Esteemed Contributor III

Hi @sachamourier ,

I noticed that you're using Spot instances in your compute. You need to increase vCPU quotas for Spot instances then. Check below article on MS docs:
Increase spot vCPU family quotas - Azure Quotas | Microsoft Learn

Basically, you need to do following:

1. To view the Quotas page, sign in to the Azure portal and enter "quotas" into the search box, then select Quotas.

2. On the Overview page, select Compute.

3. On the My quotas page, enter "spot" in the Search box.

4. Filter for any other requirements, such as Usage, as needed.

5. Find the quota or quotas you want to increase, and select them.

6. Near the top of the page, select New Quota Request, then select the way you'd like to increase the
quota(s): Enter a new limit or Adjust the usage %.

7. If you selected Enter a new limit: In the New Quota Request pane, enter a numerical value for each new quota limit.

8. If you selected Adjust the usage %: In the New Quota Request pane, adjust the slider to a new usage percent. Adjusting the percentage automatically calculates the new limit for each quota to be increased. This option is particularly useful when the selected quotas have very high usage.

9.When you're finished, select Submit.

szymon_dybczak_0-1751619444383.png

 

sachamourier
Contributor

Hi @szymon_dybczak ,

Thank you for your help. I have followed your suggested steps and have the following output:

sachamourier_0-1751620507575.png

Should this increased then ? It seems like it's not being used though...

Sacha

 



szymon_dybczak
Esteemed Contributor III

Hi @sachamourier ,

Yes, I believe you need to increase that count. Let's have a look at previous screenshot you've shared with us. I marked your current limit with red circle. It matches limit that you have on your regional spot vCPUs.

szymon_dybczak_0-1751620767909.png

 

sachamourier
Contributor

@szymon_dybczak That makes sense thank you ! 

However, do you have any idea of why it shows "Current usage: 0" ? 

sachamourier_0-1751621239335.png

And also why does the cluster still manage to reach the max number of nodes even with these "failures" ? 

Sacha



szymon_dybczak
Esteemed Contributor III

Hi, 

Regarding current usage - it only means that at that exact moment, no Spot vCPUs are actively provisioned in your subscription/region.
Regarding second question, maybe @Vidhi_Khaitan  will be able to answer. 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now