Performance issue: Running 50 notebooks from ADF

Data Engineering

Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.

I have process in Data factory, that loads CDC changes from sql server and then trigger notebook with merge to bronze and silver zone. Single notebook takes about 1 minute to run but when all 50 notebooks are fired at once the whole process takes 25 minutes.

There is not a lot of changes in sql tables. When notebooks run, cluster must scale up and it takes much more time to finish.

Is it really a big deal for cluster to run 50 notebooks in parallel?

cluster config: 12.2 LTS access mode shared

Photon enabled

worker: 2-8 standard DS3 v2

driver: standard DS3 v2

here is screenshot from ganglia - load starts at 0600