Cannot re-initialize CUDA in forked subprocess.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-12-2023 07:51 AM
This is the error I am getting :"RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method". I am using 13.0nc12s_v3 Cluster.
I used this one :"
", but still getting the same issue. Any solution?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-21-2023 12:59 PM
Hi @phdykd,
Thank you for posting your question in the Databricks community.
One approach is to include the start_method="fork" parameter in the spawn function call as follows: mp.spawn(*prev_args, start_method="fork"). Although this will work, it might raise a warning suggesting to use method (option 2 below).
Another recommended solution, according to PyTorch (link), is to use torch.multiprocessing.start_processes: torch.multiprocessing.start_processes(*prev_args, start_method="fork").
It's important to note that the above options are not compatible with CUDA (link, link). Hence, attempting to run any .cuda related commands will lead to failures.
- The viable solution that successfully resolves all of these issues is to utilize TorchDistributor(local_mode=True).
Please refer to this Documentation for more details

