cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Parrallel Processing : Pool with 8 core and Standard Instance with 28GB

NanthakumarYoga
New Contributor

Hi Team,

Need your inputs here on desiging the pool for our parrallel processing

We are processing around 4 to 5 GB files ( Process having adding a row number, removing header/trailer, adding addition 8 column which calculates over all 104 columns per record ). This is likely a CSV file to delta table storage after performing some basic validations.  Each day carries two task ( Validation and Transformation )

When we process conitinuously for 5 days, our jobs are failing with MAX_POOL_CAPACITY error. 

Iniitially we have 20 instance and we increased to 40 instances now. Still 1 or 2 jobs failing which is strange?

would you please guide me here ?

No joins, it is straight forward 1:1 load from CSV to delta with some basic data quality checks

Regards, Nantha.

3 REPLIES 3

Kaniz_Fatma
Community Manager
Community Manager

Hi @NanthakumarYoga , 

The MAX_POOL_CAPACITY error may be due to:


 - Insufficient instances in the pool to fulfil cluster creation requests.
 - Inefficient management of the instance pool, like using a single pool for multiple concurrent cluster launches without adjusting capacity.


- To troubleshoot:


 - Use the Cluster Insights tool for more information.
 - Analyze stats related to instances prewarmed, new VMs requested, VMs used by clusters, etc., using instance pool-related metrics events from the usage_logs table.
- If the instance pool uses spot instances, the spot instance for the driver may have been killed by Azure without notification, causing intermittent Databricks job failures.
- If experiencing REQUEST_LIMIT_EXCEEDED errors, reduce the number of nodes provisioned using instance pools and use more significant instance types for the clusters to cache instances and avoid churning.
- Preload the runtime on the instance pool to improve cluster start times and investigate the node daemon log if the issue persists.
- Check if the instance pool API returns the same value as the number of instances. The cluster monitor might not be tracking the correct upsize request.

Kaniz_Fatma
Community Manager
Community Manager

Hi @NanthakumarYoga, We haven't heard from you since the last response from me, and I was checking back to see if my suggestions helped you. 

 

Or else, If you have any solution, please share it with the community as it can be helpful to others.

 

Also, please don't forget to click on the "Accept as Solution" button whenever the information provided helps resolve your question.

siddhathPanchal
New Contributor III
New Contributor III

Hi Nanthakumar. I also agree with the above solution. If this solution works for you, don't forget to press 'Accept as Solution' button.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!