Databricks Community

Farzana · ‎08-04-2023

Hi Team,

We have an @adf pipeline which will run some set of activities before #Azure databricks notebooks get called.As and when the notebooks are called our pipeline will launch a new cluster for every job with job compute as Standard F4 with a single worker node.To launch the cluster itself it is taking ~7mins which increases the overall ADF pipeline run time.

Could you please suggest a solution to reduce the cluster launch time?

Note:Our ADF pipeline has an event based trigger which will run as and when there is a file comes to ADLS. We cannot have a cluster created and running all the time as it impacts the cost.

Thanks

Farzana · ‎08-04-2023

@Retired_mod Thanks for the response. Could you please elaborate what do you mean by preloading the runtime on instance pool?

Even the cluster pool needs to run continuously(as there is no specific time period defined for the files to come to ADLS) in order to reduce the launch time of cluster for each and every job so that the over all ADF pipeline run time can be fast.isn't it?

Please help me in understanding "preload the runtime on the instance pool"

Thanks