<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Load assignment during Distributed training in Administration &amp; Architecture</title>
    <link>https://community.databricks.com/t5/administration-architecture/load-assignment-during-distributed-training/m-p/120531#M3413</link>
    <description>&lt;P&gt;From what I know, Spark automatically handles how data and workload are distributed across worker nodes during distributed training, you can't manually control exactly what or how much data goes to a specific node. You can still influence the distribution to some extent by using techniques like repartition, partitionBy, or custom partitioners. These help control how the data is distributed across partitions, but not which worker node ends up processing which part. Spark’s scheduler still decides that part behind the scenes.&lt;/P&gt;</description>
    <pubDate>Thu, 29 May 2025 12:17:37 GMT</pubDate>
    <dc:creator>Renu_</dc:creator>
    <dc:date>2025-05-29T12:17:37Z</dc:date>
    <item>
      <title>Load assignment during Distributed training</title>
      <link>https://community.databricks.com/t5/administration-architecture/load-assignment-during-distributed-training/m-p/120385#M3403</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I wanted to confirm, in a distributed training, if there is any way that I can control what kind/amount of load/data can be send to specific worker nodes, manually ..Or is it completely automatically handled by spark's scheduler, and we don't have control over that&lt;/P&gt;</description>
      <pubDate>Wed, 28 May 2025 08:16:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/load-assignment-during-distributed-training/m-p/120385#M3403</guid>
      <dc:creator>aswinkks</dc:creator>
      <dc:date>2025-05-28T08:16:53Z</dc:date>
    </item>
    <item>
      <title>Re: Load assignment during Distributed training</title>
      <link>https://community.databricks.com/t5/administration-architecture/load-assignment-during-distributed-training/m-p/120531#M3413</link>
      <description>&lt;P&gt;From what I know, Spark automatically handles how data and workload are distributed across worker nodes during distributed training, you can't manually control exactly what or how much data goes to a specific node. You can still influence the distribution to some extent by using techniques like repartition, partitionBy, or custom partitioners. These help control how the data is distributed across partitions, but not which worker node ends up processing which part. Spark’s scheduler still decides that part behind the scenes.&lt;/P&gt;</description>
      <pubDate>Thu, 29 May 2025 12:17:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/load-assignment-during-distributed-training/m-p/120531#M3413</guid>
      <dc:creator>Renu_</dc:creator>
      <dc:date>2025-05-29T12:17:37Z</dc:date>
    </item>
  </channel>
</rss>

