cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Parrallel Processing : Pool with 8 core and Standard Instance with 28GB

NanthakumarYoga
New Contributor II

Hi Team,

Need your inputs here on desiging the pool for our parrallel processing

We are processing around 4 to 5 GB files ( Process having adding a row number, removing header/trailer, adding addition 8 column which calculates over all 104 columns per record ). This is likely a CSV file to delta table storage after performing some basic validations.  Each day carries two task ( Validation and Transformation )

When we process conitinuously for 5 days, our jobs are failing with MAX_POOL_CAPACITY error. 

Iniitially we have 20 instance and we increased to 40 instances now. Still 1 or 2 jobs failing which is strange?

would you please guide me here ?

No joins, it is straight forward 1:1 load from CSV to delta with some basic data quality checks

Regards, Nantha.

1 REPLY 1

siddhathPanchal
Databricks Employee
Databricks Employee

Hi Nanthakumar. I also agree with the above solution. If this solution works for you, don't forget to press 'Accept as Solution' button.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now