Re: Does Databricks supports the Pytorch Distribut...

adarsh8304 · ‎11-29-2024

Hey, so we even can't use the TorchDistributor and Distributed Data Parallel to achieve the distributed training thing in my code, and `TorchDistributor` is a spark written distribution library, coz with this setup I am not able to get the the required distributed training that expected .. second worker node have no ups in the metrics side. .. giving this reply more path, ^^ essentially how should we do the distributed training in a databricks multi node setup which have 1 driver with 1 worker. @-werners- @axb0 @Smu_Tan , should we move out of pytorch fully for this purpose or use a complete spark code to achieve this, or there's any dependancy which can provide help with this approach.