Load assignment during Distributed training

aswinkks — Wed, 28 May 2025 08:16:53 GMT

Hi,

I wanted to confirm, in a distributed training, if there is any way that I can control what kind/amount of load/data can be send to specific worker nodes, manually ..Or is it completely automatically handled by spark's scheduler, and we don't have control over that

Re: Load assignment during Distributed training

Renu_ — Thu, 29 May 2025 12:17:37 GMT

From what I know, Spark automatically handles how data and workload are distributed across worker nodes during distributed training, you can't manually control exactly what or how much data goes to a specific node. You can still influence the distribution to some extent by using techniques like repartition, partitionBy, or custom partitioners. These help control how the data is distributed across partitions, but not which worker node ends up processing which part. Spark’s scheduler still decides that part behind the scenes.

topic Re: Load assignment during Distributed training in Administration & Architecture

Load assignment during Distributed training

Re: Load assignment during Distributed training