-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault (core dumped) error for multiple GPUs #47
Comments
I noticed that there was a similar issue in PyTorch repository Segfault in dataparallel + checkpoint #11732. It seems that it has not been fixed yet. |
@theonegis - I raised the original issue. Just to check whether they are similar problems, can you copy the faulthandler output here, to see if also points to cp.checkpoint being the issue?
at the beginning of your code should output a traceback when your code segfaults. (Apologies to the PyTorch devs if this is not helpful, I'm just curious) |
|
@theonegis what happens if you upgrade to the latest stable version of PyTorch (0.4.1)? |
@gpleiss Still the same problem. |
Yeah, just an FYI, I'm on 0.4.1 as well. And can see that yours is also a checkpoint issue. What happens if you checkpoint -all- of your layers @theonegis? |
Environment:
Problem:
I was running a model that does not need BatchNorm, so I changed the original DesneNet a little bit.
Here is the code snippet:
It can run on single GPU, but it throws a Segmentation fault (core dumped) error when running on multiple GPUS. What can be caused this issues?
The text was updated successfully, but these errors were encountered: