Performance issue: Running 50 notebooks from ADF

alesventus — Tue, 03 Oct 2023 13:28:06 GMT

I have process in Data factory, that loads CDC changes from sql server and then trigger notebook with merge to bronze and silver zone. Single notebook takes about 1 minute to run but when all 50 notebooks are fired at once the whole process takes 25 minutes.

There is not a lot of changes in sql tables. When notebooks run, cluster must scale up and it takes much more time to finish.

Is it really a big deal for cluster to run 50 notebooks in parallel?

cluster config: 12.2 LTS access mode shared

Photon enabled

worker: 2-8 standard DS3 v2

driver: standard DS3 v2

here is screenshot from ganglia - load starts at 0600

topic Performance issue: Running 50 notebooks from ADF in Data Engineering

Performance issue: Running 50 notebooks from ADF