-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROCm workaround: Use ParallelFor instead of Reduce #2749
Conversation
Assuming the failure is not often, we can use ParallelFor with atomicAdd to obtain the number of failures. With this change, the ROCm memory issue seems to be gone.
this fixes #2569 |
@WeiqunZhang would you recommend that we use atomics instead of (We also use |
No, not for the functions in your links. Those kernels are small. Only if the kernel is very big and you see an issue, you might try to use atomics. Note that using atomics is very slow if it needs to be done a lot of times. |
Is there a way to identify kernels that might be affected by the same issue? The kernel that calls our reaction network is here: https://github.com/quokka-astro/quokka/blob/2d863d655ec1e1f684f55ea745021b9c97a03c60/src/Chemistry.hpp#L43 The reaction network itself (6k LOC, symbolically generated, hundreds of local variables) is here: We see errors like this when running simulations with this network:
|
You can try to insert
It might help in your case. |
We can try that. You are referring to the call immediately after the burn kernel on line 417, right? |
Yes. |
It might also help to turn on tiling |
We could try that. Does this change the kernel launch parameters (and/or, do you know why it helps)? |
It changes the number of blocks in the kernel launch. |
Assuming the failure is not often, we can use ParallelFor with atomicAdd to
obtain the number of failures. With this change, the ROCm memory issue seems
to be gone.