Programmatically set minimum workers for a job cluster based on file size?

Alena — Tue, 05 Aug 2025 20:41:55 GMT

I’m running an ingestion pipeline with a Databricks job:

A file lands in S3
A Lambda is triggered
The Lambda runs a Databricks job

The incoming files vary a lot in size, which makes processing times vary as well. My job cluster has autoscaling enabled, but scaling up takes time.

Ideally, if a 10 GB file comes in, I’d like the job to start with more than one worker immediately, instead of waiting for autoscaling to kick in.

I’m currently using the run-now API to trigger the job, but I don’t see a way to adjust the job cluster configuration at runtime.

Is there a way to programmatically set the minimum number of workers for a job cluster depending on the incoming file size?

Re: Programmatically set minimum workers for a job cluster based on file size?

kerem — Tue, 05 Aug 2025 23:27:56 GMT

Hi Alena,

Jobs API has update functionality to be able to do that: https://docs.databricks.com/api/workspace/jobs_21/update

If for some reason you can’t update your pipeline before you trigger it you can also consider creating a new job with desired configuration every time you run a trigger (POST /api/2.2/jobs/create).

Kerem Durak

topic Re: Programmatically set minimum workers for a job cluster based on file size? in Data Engineering

Programmatically set minimum workers for a job cluster based on file size?

Re: Programmatically set minimum workers for a job cluster based on file size?