Databricks Community

NanthakumarYoga · ‎10-09-2023

Hi Team,

Need your inputs here on desiging the pool for our parrallel processing

We are processing around 4 to 5 GB files ( Process having adding a row number, removing header/trailer, adding addition 8 column which calculates over all 104 columns per record ). This is likely a CSV file to delta table storage after performing some basic validations. Each day carries two task ( Validation and Transformation )

When we process conitinuously for 5 days, our jobs are failing with MAX_POOL_CAPACITY error.

Iniitially we have 20 instance and we increased to 40 instances now. Still 1 or 2 jobs failing which is strange?

would you please guide me here ?

No joins, it is straight forward 1:1 load from CSV to delta with some basic data quality checks

Regards, Nantha.

siddhathPanchal · ‎11-10-2023

Hi Nanthakumar. I also agree with the above solution. If this solution works for you, don't forget to press 'Accept as Solution' button.

Databricks Community

Parrallel Processing : Pool with 8 core and Standard Instance with 28GB

Connect with Databricks Users in Your Area

Introducing SAP Databricks

Serverless Compute for Notebooks, Workflows and Pipelines is now Generally Available on Google Cloud

Welcoming BladeBridge to Databricks: Accelerating Data Warehouse Migrations to Lakehouse

Databricks Clean Rooms: Now Generally Available on AWS and Azure

Securely share data, analytics and AI