cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Load assignment during Distributed training

aswinkks
New Contributor III

Hi,

I wanted to confirm, in a distributed training, if there is any way that I can control what kind/amount of load/data can be send to specific worker nodes, manually ..Or is it completely automatically handled by spark's scheduler, and we don't have control over that

1 REPLY 1

Renu_
Valued Contributor II

From what I know, Spark automatically handles how data and workload are distributed across worker nodes during distributed training, you can't manually control exactly what or how much data goes to a specific node. You can still influence the distribution to some extent by using techniques like repartition, partitionBy, or custom partitioners. These help control how the data is distributed across partitions, but not which worker node ends up processing which part. Sparkโ€™s scheduler still decides that part behind the scenes.