- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-16-2024 01:30 PM
I have found if you use a job cluster from a pool in ADF, it starts to create a cluster per data bricks adf activity, and you end up with more than 1 cluster running.
I have a shared computer cluster for ADF with Photon/Unity enabled, and a fixed worker count. I start the Databricks cluster via the REST API before the ETL runs, saving 5/10 mins of cluster start-up time.
Once the ETL finishes, it runs the notebooks via the Databricks ADF activity and stops the cluster after the ETL has finished using the REST API.
It works well and gives you control over what gets spun up. You can also use spot instances to save resource costs.
API Reference : https://docs.databricks.com/api/workspace/clusters/ (start & terminate)
Regards
Toby
https://thedatacrew.com