cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Cannot re-initialize CUDA in forked subprocess.

phdykd
New Contributor

This is the error I am getting :"RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method". I am using 13.0nc12s_v3 Cluster.

I used this one :"

import torch.multiprocessing as mp
mp.set_start_method('spawn', force=True)

from pytorch_lightning.callbacks import EarlyStopping

", but still getting the same issue. Any solution?

Thanks

 

 

1 REPLY 1

Kumaran
Databricks Employee
Databricks Employee

Hi @phdykd,
Thank you for posting your question in the Databricks community.

  1. One approach is to include the start_method="fork" parameter in the spawn function call as follows: mp.spawn(*prev_args, start_method="fork"). Although this will work, it might raise a warning suggesting to use method (option 2 below).

  2. Another recommended solution, according to PyTorch (link), is to use torch.multiprocessing.start_processes: torch.multiprocessing.start_processes(*prev_args, start_method="fork").

  3. It's important to note that the above options are not compatible with CUDA (link, link). Hence, attempting to run any .cuda related commands will lead to failures.

  4. The viable solution that successfully resolves all of these issues is to utilize TorchDistributor(local_mode=True).

Please refer to this Documentation for more details

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group