Skip to content

Latest commit

 

History

History
97 lines (78 loc) · 3.71 KB

element-wise.adoc

File metadata and controls

97 lines (78 loc) · 3.71 KB

Matrix element-wise add/sub/multiply instructions. The input and output matrices are both accumulation registers and always with size mtilem x mtilen. The element-wise calculation of tile registers can be implemented by combining data move instructions (such as mmve*.a.t and mmve*.t.a).

# Unsigned integer matrix element-wise add.
# md[i,j] = ms1[i,j] + ms2[i,j]
maddu.[hb|b|h|w|dw].mm   md, ms1, ms2
msaddu.[hb|b|h|w|dw].mm  md, ms1, ms2  # output saturated
mwaddu.[hb|b|h|w].mm     md, ms1, ms2  # output double widen

# Signed integer matrix element-wise add.
# md[i,j] = ms1[i,j] + ms2[i,j]
madd.[hb|b|h|w|dw].mm    md, ms1, ms2
msadd.[hb|b|h|w|dw].mm   md, ms1, ms2  # output saturated
mwadd.[hb|b|h|w].mm      md, ms1, ms2  # output double widen

# Unsigned integer matrix element-wise subtract.
# md[i,j] = ms1[i,j] - ms2[i,j]
msubu.[hb|b|h|w|dw].mm   md, ms1, ms2
mssubu.[hb|b|h|w|dw].mm  md, ms1, ms2  # output saturated
mwsubu.[hb|b|h|w].mm     md, ms1, ms2  # output double widen

# Signed integer matrix element-wise subtract.
# md[i,j] = ms1[i,j] - ms2[i,j]
msub.[hb|b|h|w|dw].mm    md, ms1, ms2
mssub.[hb|b|h|w|dw].mm   md, ms1, ms2  # output saturated
mwsub.[hb|b|h|w].mm      md, ms1, ms2  # output double widen

# Integer matrix element-wise minimum.
# md[i,j] = min{ms1[i,j], ms2[i,j]}
mminu.[hb|b|h|w|dw].mm   md, ms1, ms2
mmin.[hb|b|h|w|dw].mm    md, ms1, ms2

# Integer matrix element-wise maximum.
# md[i,j] = max{ms1[i,j], ms2[i,j]}
mmaxu.[hb|b|h|w|dw].mm   md, ms1, ms2
mmax.[hb|b|h|w|dw].mm    md, ms1, ms2

# Integer matrix bit-wise logic.
mand.mm                  md, ms1, ms2
mor.mm                   md, ms1, ms2
mxor.mm                  md, ms1, ms2

# Integer matrix element-wise shift.
msll.[hb|b|h|w|dw].mm    md, ms1, ms2
msrl.[hb|b|h|w|dw].mm    md, ms1, ms2
msra.[hb|b|h|w|dw].mm    md, ms1, ms2

# Integer matrix element-wise multiply.
# md[i,j] = ms1[i,j] * ms2[i,j]
mmul.[hb|b|h|w|dw].mm    md, ms1, ms2  # signed, returning low bits of product
mmulh.[hb|b|h|w|dw].mm   md, ms1, ms2  # signed, returning high bits of product
mmulhu.[hb|b|h|w|dw].mm  md, ms1, ms2  # unsigned, returning high bits of product
mmulhsu.[hb|b|h|w|dw].mm md, ms1, ms2  # signed-unsigned, returning high bits of product

# Saturated integer matrix element-wise multiply.
msmul.[hb|b|h|w|dw].mm   md, ms1, ms2  # signed
msmulu.[hb|b|h|w|dw].mm  md, ms1, ms2  # unsigned
msmulsu.[hb|b|h|w|dw].mm md, ms1, ms2  # signed-unsigned

# Widening integer matrix element-wise multiply.
mwmul.[hb|b|h|w].mm      md, ms1, ms2  # signed
mwmulu.[hb|b|h|w].mm     md, ms1, ms2  # unsigned
mwmulsu.[hb|b|h|w].mm    md, ms1, ms2  # signed-unsigned

# Float matrix element-wise add.
# md[i,j] = ms1[i,j] + ms2[i,j]
mfadd.[cf|hf|f|d].mm     md, ms1, ms2
mfwadd.[cf|hf|f].mm      md, ms1, ms2  # output double widen

# Float matrix element-wise subtract.
# md[i,j] = ms1[i,j] - ms2[i,j]
mfsub.[cf|hf|f|d].mm     md, ms1, ms2
mfwsub.[cf|hf|f].mm      md, ms1, ms2  # output double widen

# Float matrix element-wise minimum.
# md[i,j] = min{ms1[i,j], ms2[i,j]}
mfmin.[cf|hf|f|d].mm     md, ms1, ms2

# Float matrix element-wise maximum.
# md[i,j] = max{ms1[i,j], ms2[i,j]}
mfmax.[cf|hf|f|d].mm     md, ms1, ms2

# Float matrix element-wise multiply.
# md[i,j] = ms1[i,j] * ms2[i,j]
mfmul.[cf|hf|f|d].mm     md, ms1, ms2
mfwmul.[cf|hf|f].mm      md, ms1, ms2  # output double widen

# Float matrix element-wise divide.
# md[i,j] = ms1[i,j] / ms2[i,j]
mfdiv.[cf|hf|f|d].mm     md, ms1, ms2

# Float matrix element-wise square root.
# md[i,j] = ms1[i,j] ^ (1/2)
mfsqrt.[cf|hf|f|d].m     md, ms1
Note
There is no matrix-scalar and matrix-vector version for element-wise instructions. Such operations can be replaced by a broadcast instruction and a matrix-matrix element-wise instruction.