cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Parrallel Processing : Pool with 8 core and Standard Instance with 28GB

NanthakumarYoga
New Contributor II

Hi Team,

Need your inputs here on desiging the pool for our parrallel processing

We are processing around 4 to 5 GB files ( Process having adding a row number, removing header/trailer, adding addition 8 column which calculates over all 104 columns per record ). This is likely a CSV file to delta table storage after performing some basic validations.  Each day carries two task ( Validation and Transformation )

When we process conitinuously for 5 days, our jobs are failing with MAX_POOL_CAPACITY error. 

Iniitially we have 20 instance and we increased to 40 instances now. Still 1 or 2 jobs failing which is strange?

would you please guide me here ?

No joins, it is straight forward 1:1 load from CSV to delta with some basic data quality checks

Regards, Nantha.

1 REPLY 1

siddhathPanchal
Databricks Employee
Databricks Employee

Hi Nanthakumar. I also agree with the above solution. If this solution works for you, don't forget to press 'Accept as Solution' button.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group