- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-21-2023 12:59 PM
Hi @phdykd,
Thank you for posting your question in the Databricks community.
One approach is to include the start_method="fork" parameter in the spawn function call as follows: mp.spawn(*prev_args, start_method="fork"). Although this will work, it might raise a warning suggesting to use method (option 2 below).
Another recommended solution, according to PyTorch (link), is to use torch.multiprocessing.start_processes: torch.multiprocessing.start_processes(*prev_args, start_method="fork").
It's important to note that the above options are not compatible with CUDA (link, link). Hence, attempting to run any .cuda related commands will lead to failures.
- The viable solution that successfully resolves all of these issues is to utilize TorchDistributor(local_mode=True).
Please refer to this Documentation for more details