cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Performance issue: Running 50 notebooks from ADF

alesventus
New Contributor III

I have process in Data factory, that loads CDC changes from sql server and then trigger notebook with merge to bronze and silver zone. Single notebook takes about 1 minute to run but when all 50 notebooks are fired at once the whole process takes 25 minutes. 

There is not a lot of changes in sql tables. When notebooks run, cluster must scale up and it takes much more time to finish.

Is it really a big deal for cluster to run 50 notebooks in parallel?

cluster config: 12.2 LTS access mode shared

Photon enabled

worker: 2-8 standard DS3 v2

driver: standard DS3 v2

here is screenshot from ganglia - load starts at 0600

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @alesventus , 

Running multiple notebooks simultaneously can overload the cluster and increase processing time.


- The current cluster configuration may not be sufficient for handling 50 resource-intensive notebooks.
- Databricks recommends specific practices for production workloads, such as controlling batch size and frequency and understanding structured streaming.
- Stateful streaming should be used carefully to avoid unexpected latency and production problems.
- Databricks Notebooks require attachment to a cluster for manual query execution and can be scheduled for automated deployment and recovery.
- Monitoring and managing the cluster is crucial to handle the workload efficiently.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.