Re: Understanding Autoscaling in Databricks: Under...

filipniziol · ‎12-16-2024

Short Answer:

Autoscaling primarily depends on the number of pending tasks.
Workspaces on the Premium plan use optimized autoscaling, while those on the Standard plan use standard autoscaling.

Long Answer:

Databricks autoscaling responds mainly to sustained backlogs of unscheduled tasks rather than CPU or memory usage alone. If the number of pending tasks consistently exceeds your current cluster capacity—meaning more tasks are queued than available cores can handle—Databricks will consider adding a new worker node.

Key Points:

Pending Tasks as the Main Trigger: Autoscaling monitors how many tasks remain queued. Persistent queues indicate that existing workers can’t keep up, prompting additional workers.
Not Instantaneous, But Sustained Load: Spark waits to confirm that the increased demand isn’t just a short-lived spike. Only after tasks remain pending for a threshold period does scaling occur.
Indirect Role of CPU/Memory Utilization: While CPU/memory affect task completion speed, autoscaling decisions are based on task queues rather than these metrics directly.
Timing and Reaction: Adding a new worker typically takes a minute or so, ensuring scaling responds to stable workload increases rather than momentary fluctuations.

Useful Links: