cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Possible to programmatically adjust Databricks instance pool more intelligently?

mrstevegross
Contributor III

We'd like to adopt Databricks instance pool in order to reduce instance-acquisition times (a significant contributor to our test latency). Based on my understanding of the docs, the main levers we can control are: min instance count, max instance count, and idle termination time.

However, our usage pattern is bursty: every 2 hours we run a full test suite, which means ~100 instances need to be acquired. After the test run is done, we release the instances (until the next ~2 hour run). Are there any options for managing the instance pool to better take advantage of the predictable bursty usage pattern?

1 ACCEPTED SOLUTION

Accepted Solutions

Isi
Honored Contributor III

Hi Steve,

If the goal is to pre-warm 100 instances in the Databricks Instance Pool, you could create a temporary job that will request instances from the pool. This ensures that Databricks provisions the required instances before the actual test run.

The cluster will force Databricks to allocate 100 instances from the pool. Since autotermination_minutes can be set to 10 (for example), it will shut down shortly after the instances are ready. Your actual test jobs will then be able to use the pre-warmed instances instantly.

I think is and idea that could work and pre warm instances

๐Ÿ™‚

Isi

View solution in original post

4 REPLIES 4

Isi
Honored Contributor III

Hi @mrstevegross ,

Iโ€™d like to clarify how those ~100 instances are used:

Are all 100 instances required simultaneously (fully parallel execution), or do they ramp up progressively based on test needs?

How long do the tests typically run before the instances are released?

If all 100 instances are needed at once (fully parallel execution):

โ€ข Pre-warm the pool: Around 1h 55m before each test run, trigger 100 API calls to acquire instances ahead of time.
โ€ขSet idle_termination = 0 to ensure instances shut down as soon as tests complete.

โ€ขThis guarantees minimal latency for instance allocation.

If instance usage is progressive (some start earlier, some later):

โ€ข Adjust idle_termination_time so instances stay warm long enough to complete dependent tasks, but donโ€™t stay idle too long.
โ€ขSet a small min instance count to keep a few warm if the progressive ramp-up is predictable.

 

If you can share more about how your test execution scales over time, Iโ€™d be happy to refine the strategy further!

๐Ÿ™‚

Isi

mrstevegross
Contributor III

>If all 100 instances are needed at once (fully parallel execution):

>โ€ข Pre-warm the pool: Around 1h 55m before each test run, trigger 100 API calls to acquire instances ahead of time.
>โ€ขSet idle_termination = 0 to ensure instances shut down as soon as tests complete.

Which API call do you have in mind to "acquire" instances? (Looking at the API docs it's not obvious which call you have in mind)

Thanks,

--Steve

Isi
Honored Contributor III

Hi Steve,

If the goal is to pre-warm 100 instances in the Databricks Instance Pool, you could create a temporary job that will request instances from the pool. This ensures that Databricks provisions the required instances before the actual test run.

The cluster will force Databricks to allocate 100 instances from the pool. Since autotermination_minutes can be set to 10 (for example), it will shut down shortly after the instances are ready. Your actual test jobs will then be able to use the pre-warmed instances instantly.

I think is and idea that could work and pre warm instances

๐Ÿ™‚

Isi


If the goal is to pre-warm 100 instances in the Databricks Instance Pool, you could create a temporary job that will request instances from the pool. This ensures that Databricks provisions the required instances before the actual test run.

The cluster will force Databricks to allocate 100 instances from the pool. Since autotermination_minutes can be set to 10 (for example), it will shut down shortly after the instances are ready. Your actual test jobs will then be able to use the pre-warmed instances instantly.

Yeah, that makes sense, and could be a workable tactic. Thanks!