Skip to content

Latest commit

 

History

History
71 lines (61 loc) · 3.72 KB

matmul.adoc

File metadata and controls

71 lines (61 loc) · 3.72 KB

Matrix Multiplication operations take two matrix tiles from matrix tile registers specified by ms1 and ms2 respectively, and the output matrix tile is a matrix accumulation register specified by md.

# Unigned integer matrix multiplication and add, md = md + ms1 * ms2.
mmau.[dw].mm    md, ms1, ms2        # uint64, output no-widen
mmau.[w].mm     md, ms1, ms2        # uint32, output no-widen
mmau.[h].mm     md, ms1, ms2        # uint16, output no-widen
mwmau.[w].mm    md, ms1, ms2        # uint32, output double-widen
mwmau.[h].mm    md, ms1, ms2        # uint16, output double-widen
mqmau.[b].mm    md, ms1, ms2        # uint8, output quad-widen
momau.[hb].mm   md, ms1, ms2        # uint4, output oct-widen

msmau.[dw].mm   md, ms1, ms2        # uint64, output no-widen and saturated
msmau.[w].mm    md, ms1, ms2        # uint32, output no-widen and saturated
msmau.[h].mm    md, ms1, ms2        # uint16, output no-widen and saturated
mswmau.[w].mm   md, ms1, ms2        # uint32, output double-widen and saturated
mswmau.[h].mm   md, ms1, ms2        # uint16, output double-widen and saturated
msqmau.[b].mm   md, ms1, ms2        # uint8, output quad-widen and saturated
msomau.[hb].mm  md, ms1, ms2        # uint4, output oct-widen and saturated

# Signed integer matrix multiplication and add, md = md + ms1 * ms2.
mma.[dw].mm     md, ms1, ms2        # int64, output no-widen
mma.[w].mm      md, ms1, ms2        # int32, output no-widen
mma.[h].mm      md, ms1, ms2        # int16, output no-widen
mwma.[w].mm     md, ms1, ms2        # int32, output double-widen
mwma.[h].mm     md, ms1, ms2        # int16, output double-widen
mqma.[b].mm     md, ms1, ms2        # int8, output quad-widen
moma.[hb].mm    md, ms1, ms2        # int4, output oct-widen

msma.[dw].mm    md, ms1, ms2        # int64, output no-widen and saturated
msma.[w].mm     md, ms1, ms2        # int32, output no-widen and saturated
msma.[h].mm     md, ms1, ms2        # int16, output no-widen and saturated
mswma.[w].mm    md, ms1, ms2        # int32, output double-widen and saturated
mswma.[h].mm    md, ms1, ms2        # int16, output double-widen and saturated
msqma.[b].mm    md, ms1, ms2        # int8, output quad-widen and saturated
msoma.[hb].mm   md, ms1, ms2        # int4, output oct-widen and saturated

# Float point matrix multiplication and add, md = md + ms1 * ms2.
mfma.[d].mm     md, ms1, ms2        # 64-bit float point
mfma.[f].mm     md, ms1, ms2        # 32-bit float point
mfma.[hf].mm    md, ms1, ms2        # 16-bit float point

mfwma.[f].mm    md, ms1, ms2        # 32-bit float point, output double-widen
mfwma.[hf].mm   md, ms1, ms2        # 16-bit float point, output double-widen
mfwma.[cf].mm   md, ms1, ms2        # 8-bit float point, output double-widen
mfqma.[cf].mm   md, ms1, ms2        # 8-bit float point, output quad-widen

A subset of these instructions is supported according to the implemented standard extensions (Zmi4, Zmi8, etc.).

The field frm from fcsr indicates the rounding mode of float-point matrix instructions. The encoding is shown below.

frm

Mnemonic

Meaning

000

RNE

Round to Nearest, ties to Even

001

RTZ

Round towards Zero

010

RDN

Round Down (towards \$-\infty\$)

011

RUP

Round Up (towards \$+\infty\$)

100

RMM

Round to Nearest, ties to Max Magnitude

101

Invalid

110

Invalid

111

Invalid