Kumaran
Databricks Employee
Databricks Employee

Hi @phdykd,
Thank you for posting your question in the Databricks community.

  1. One approach is to include the start_method="fork" parameter in the spawn function call as follows: mp.spawn(*prev_args, start_method="fork"). Although this will work, it might raise a warning suggesting to use method (option 2 below).

  2. Another recommended solution, according to PyTorch (link), is to use torch.multiprocessing.start_processes: torch.multiprocessing.start_processes(*prev_args, start_method="fork").

  3. It's important to note that the above options are not compatible with CUDA (link, link). Hence, attempting to run any .cuda related commands will lead to failures.

  4. The viable solution that successfully resolves all of these issues is to utilize TorchDistributor(local_mode=True).

Please refer to this Documentation for more details