Databricks Community

Alena · ‎08-05-2025

I’m running an ingestion pipeline with a Databricks job:

A file lands in S3
A Lambda is triggered
The Lambda runs a Databricks job

The incoming files vary a lot in size, which makes processing times vary as well. My job cluster has autoscaling enabled, but scaling up takes time.

Ideally, if a 10 GB file comes in, I’d like the job to start with more than one worker immediately, instead of waiting for autoscaling to kick in.

I’m currently using the run-now API to trigger the job, but I don’t see a way to adjust the job cluster configuration at runtime.

Is there a way to programmatically set the minimum number of workers for a job cluster depending on the incoming file size?

kerem · ‎08-05-2025

Hi Alena,

Jobs API has update functionality to be able to do that: https://docs.databricks.com/api/workspace/jobs_21/update

If for some reason you can’t update your pipeline before you trigger it you can also consider creating a new job with desired configuration every time you run a trigger (POST /api/2.2/jobs/create).

Kerem Durak

Databricks Community

Programmatically set minimum workers for a job cluster based on file size?

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! November 28 – December 04, 2025

Lakehouse, Lagers & Legends — Bangalore Meetup | December 13

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐