-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GEMM NN GPU fails even with the under-transfer fix #138
Labels
bug
Something isn't working
Comments
abouteiller
added a commit
to abouteiller/dplasma
that referenced
this issue
Mar 12, 2025
abouteiller
added a commit
to abouteiller/parsec
that referenced
this issue
Mar 12, 2025
than crashing in a cryptic way), examplified by ICLDisco/dplasma#138
abouteiller
added a commit
to abouteiller/dplasma
that referenced
this issue
Mar 13, 2025
abouteiller
added a commit
to abouteiller/dplasma
that referenced
this issue
Mar 13, 2025
abouteiller
added a commit
to abouteiller/dplasma
that referenced
this issue
Mar 13, 2025
abouteiller
added a commit
to abouteiller/dplasma
that referenced
this issue
Mar 13, 2025
abouteiller
added a commit
that referenced
this issue
Mar 27, 2025
The Gemm NN GPU is missing declaration of datatype #138
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
RESOLUTION: Found issue in DPLASMA, missing dplasma_add2arena for the gpuNN gemm
In gdb
We have
output->data.remote_dst_datatype == NULL
, which is not equal toPARSEC_DATATYPE_NULL (MPI_DATATYPE_NULL)
, so we go on and call the MPI_GET_NAME and crash MPI.Two issues here
Explanation: this comes from GLOBAL_BARRIER Y, which is a CTL, thus with no type. This looks like it is a bug in get_datatype with CTL.the arena_datatypes in GEMM_NN_GPU was not filled.Not immediately clear why/if this is related to the PR, or we just fixed the other issue that was masking this one.
Originally posted by @abouteiller in ICLDisco/parsec#733 (comment)
The text was updated successfully, but these errors were encountered: