Hi Everyone ,
I am trying to run a databricks notebook in parallel using ThreadPoolExecutor .
Can anyone suggest how to reduce the time taken based on the below findings so far.
Current Performance:
Time taken - 25 minutes
ThreadPoolExecutor max_workers - 24
Current Cluster configuration :
DBR - 9.1 LTS
Min workers - 2
Max workers - 6
Number of cores - 4 per worker
Memory - 14 GB per worker
Auto Scaling enabled
I tried increasing the number of workers to 18 hoping it would reduce the time taken but it didn't actually help.
Any thoughts on how to reduce the time ..