cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Parrallel Processing : Pool with 8 core and Standard Instance with 28GB

NanthakumarYoga
New Contributor

Hi Team,

Need your inputs here on desiging the pool for our parrallel processing

We are processing around 4 to 5 GB files ( Process having adding a row number, removing header/trailer, adding addition 8 column which calculates over all 104 columns per record ). This is likely a CSV file to delta table storage after performing some basic validations.  Each day carries two task ( Validation and Transformation )

When we process conitinuously for 5 days, our jobs are failing with MAX_POOL_CAPACITY error. 

Iniitially we have 20 instance and we increased to 40 instances now. Still 1 or 2 jobs failing which is strange?

would you please guide me here ?

No joins, it is straight forward 1:1 load from CSV to delta with some basic data quality checks

Regards, Nantha.

3 REPLIES 3

Kaniz
Community Manager
Community Manager

Hi @NanthakumarYoga , 

The MAX_POOL_CAPACITY error may be due to:


 - Insufficient instances in the pool to fulfil cluster creation requests.
 - Inefficient management of the instance pool, like using a single pool for multiple concurrent cluster launches without adjusting capacity.


- To troubleshoot:


 - Use the Cluster Insights tool for more information.
 - Analyze stats related to instances prewarmed, new VMs requested, VMs used by clusters, etc., using instance pool-related metrics events from the usage_logs table.
- If the instance pool uses spot instances, the spot instance for the driver may have been killed by Azure without notification, causing intermittent Databricks job failures.
- If experiencing REQUEST_LIMIT_EXCEEDED errors, reduce the number of nodes provisioned using instance pools and use more significant instance types for the clusters to cache instances and avoid churning.
- Preload the runtime on the instance pool to improve cluster start times and investigate the node daemon log if the issue persists.
- Check if the instance pool API returns the same value as the number of instances. The cluster monitor might not be tracking the correct upsize request.

Kaniz
Community Manager
Community Manager

Hi @NanthakumarYoga, We haven't heard from you since the last response from me, and I was checking back to see if my suggestions helped you. 

 

Or else, If you have any solution, please share it with the community as it can be helpful to others.

 

Also, please don't forget to click on the "Accept as Solution" button whenever the information provided helps resolve your question.

siddhathPanchal
New Contributor III
New Contributor III

Hi Nanthakumar. I also agree with the above solution. If this solution works for you, don't forget to press 'Accept as Solution' button.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.