Matrix element-wise add/sub/multiply instructions. The input and output matrices are both accumulation registers and always with size mtilem x mtilen
. The element-wise calculation of tile registers can be implemented by combining data move instructions (such as mmve*.a.t
and mmve*.t.a
).
# Unsigned integer matrix element-wise add.
# md[i,j] = ms1[i,j] + ms2[i,j]
maddu.[hb|b|h|w|dw].mm md, ms1, ms2
msaddu.[hb|b|h|w|dw].mm md, ms1, ms2 # output saturated
mwaddu.[hb|b|h|w].mm md, ms1, ms2 # output double widen
# Signed integer matrix element-wise add.
# md[i,j] = ms1[i,j] + ms2[i,j]
madd.[hb|b|h|w|dw].mm md, ms1, ms2
msadd.[hb|b|h|w|dw].mm md, ms1, ms2 # output saturated
mwadd.[hb|b|h|w].mm md, ms1, ms2 # output double widen
# Unsigned integer matrix element-wise subtract.
# md[i,j] = ms1[i,j] - ms2[i,j]
msubu.[hb|b|h|w|dw].mm md, ms1, ms2
mssubu.[hb|b|h|w|dw].mm md, ms1, ms2 # output saturated
mwsubu.[hb|b|h|w].mm md, ms1, ms2 # output double widen
# Signed integer matrix element-wise subtract.
# md[i,j] = ms1[i,j] - ms2[i,j]
msub.[hb|b|h|w|dw].mm md, ms1, ms2
mssub.[hb|b|h|w|dw].mm md, ms1, ms2 # output saturated
mwsub.[hb|b|h|w].mm md, ms1, ms2 # output double widen
# Integer matrix element-wise minimum.
# md[i,j] = min{ms1[i,j], ms2[i,j]}
mminu.[hb|b|h|w|dw].mm md, ms1, ms2
mmin.[hb|b|h|w|dw].mm md, ms1, ms2
# Integer matrix element-wise maximum.
# md[i,j] = max{ms1[i,j], ms2[i,j]}
mmaxu.[hb|b|h|w|dw].mm md, ms1, ms2
mmax.[hb|b|h|w|dw].mm md, ms1, ms2
# Integer matrix bit-wise logic.
mand.mm md, ms1, ms2
mor.mm md, ms1, ms2
mxor.mm md, ms1, ms2
# Integer matrix element-wise shift.
msll.[hb|b|h|w|dw].mm md, ms1, ms2
msrl.[hb|b|h|w|dw].mm md, ms1, ms2
msra.[hb|b|h|w|dw].mm md, ms1, ms2
# Integer matrix element-wise multiply.
# md[i,j] = ms1[i,j] * ms2[i,j]
mmul.[hb|b|h|w|dw].mm md, ms1, ms2 # signed, returning low bits of product
mmulh.[hb|b|h|w|dw].mm md, ms1, ms2 # signed, returning high bits of product
mmulhu.[hb|b|h|w|dw].mm md, ms1, ms2 # unsigned, returning high bits of product
mmulhsu.[hb|b|h|w|dw].mm md, ms1, ms2 # signed-unsigned, returning high bits of product
# Saturated integer matrix element-wise multiply.
msmul.[hb|b|h|w|dw].mm md, ms1, ms2 # signed
msmulu.[hb|b|h|w|dw].mm md, ms1, ms2 # unsigned
msmulsu.[hb|b|h|w|dw].mm md, ms1, ms2 # signed-unsigned
# Widening integer matrix element-wise multiply.
mwmul.[hb|b|h|w].mm md, ms1, ms2 # signed
mwmulu.[hb|b|h|w].mm md, ms1, ms2 # unsigned
mwmulsu.[hb|b|h|w].mm md, ms1, ms2 # signed-unsigned
# Float matrix element-wise add.
# md[i,j] = ms1[i,j] + ms2[i,j]
mfadd.[cf|hf|f|d].mm md, ms1, ms2
mfwadd.[cf|hf|f].mm md, ms1, ms2 # output double widen
# Float matrix element-wise subtract.
# md[i,j] = ms1[i,j] - ms2[i,j]
mfsub.[cf|hf|f|d].mm md, ms1, ms2
mfwsub.[cf|hf|f].mm md, ms1, ms2 # output double widen
# Float matrix element-wise minimum.
# md[i,j] = min{ms1[i,j], ms2[i,j]}
mfmin.[cf|hf|f|d].mm md, ms1, ms2
# Float matrix element-wise maximum.
# md[i,j] = max{ms1[i,j], ms2[i,j]}
mfmax.[cf|hf|f|d].mm md, ms1, ms2
# Float matrix element-wise multiply.
# md[i,j] = ms1[i,j] * ms2[i,j]
mfmul.[cf|hf|f|d].mm md, ms1, ms2
mfwmul.[cf|hf|f].mm md, ms1, ms2 # output double widen
# Float matrix element-wise divide.
# md[i,j] = ms1[i,j] / ms2[i,j]
mfdiv.[cf|hf|f|d].mm md, ms1, ms2
# Float matrix element-wise square root.
# md[i,j] = ms1[i,j] ^ (1/2)
mfsqrt.[cf|hf|f|d].m md, ms1
Note
|
There is no matrix-scalar and matrix-vector version for element-wise instructions. Such operations can be replaced by a broadcast instruction and a matrix-matrix element-wise instruction. |