Could anyone please suggest impact of Autoscaling in cluster cost ?
Suppose if I have a cluster where min worker is 2 and max is 10 but most of the time active worker are 3 so the cluster will be billed for only 3 workers or for 10 worker(though most of time 7 worker are ideal).
@Deepak Bhatt :
Autoscaling in Databricks can have a significant impact on cluster cost, as it allows the cluster to dynamically add or remove workers based on the workload.
In the scenario you described, if the active worker count is consistently at 3, then the cluster will only be billed for 3 workers most of the time, regardless of the maximum number of workers set for the cluster. However, if there are occasional spikes in workload that require additional workers, then the cluster may temporarily scale up to meet the demand and incur additional costs during those periods.
The cost of autoscaling depends on the instance type and the duration of the scaling events. For example, adding or removing a worker may take a few minutes to complete, and during that time, the cluster will incur costs for the additional worker(s) even if they are not fully utilized. Additionally, larger instance types will have higher hourly costs than smaller ones, so using autoscaling with larger instances may result in higher costs.
To minimize costs with autoscaling, it's important to monitor cluster usage and adjust the minimum and maximum worker counts based on the workload patterns. Setting the minimum workers to the average number of workers needed during low periods can help to reduce costs, while still allowing for scaling up during high-demand periods. Similarly, setting the maximum workers to a reasonable level that meets the needs of the workload can help to avoid unnecessary costs associated with scaling up beyond what is necessary.
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!