Load a matrix tile from memory.
# md destination, rs1 base address, rs2 row byte stride
# For left matrix, A
# tile size = mtilem * mtilek
mlae8.m md, (rs1), rs2 # 8-bit left tile load
mlae16.m md, (rs1), rs2 # 16-bit left tile load
mlae32.m md, (rs1), rs2 # 32-bit left tile load
mlae64.m md, (rs1), rs2 # 64-bit left tile load
# For right matrix, B
# tile size = mtilek * mtilen
mlbe8.m md, (rs1), rs2 # 8-bit right tile load
mlbe16.m md, (rs1), rs2 # 16-bit right tile load
mlbe32.m md, (rs1), rs2 # 32-bit right tile load
mlbe64.m md, (rs1), rs2 # 64-bit right tile load
# For output matrix, C
# tile size = mtilem * mtilen
mlce8.m md, (rs1), rs2 # 8-bit output tile load
mlce16.m md, (rs1), rs2 # 16-bit output tile load
mlce32.m md, (rs1), rs2 # 32-bit output tile load
mlce64.m md, (rs1), rs2 # 64-bit output tile load
Load a matrix tile from memory, where the matrix on memory is transposed.
# md destination, rs1 base address, rs2 row byte stride
# For left matrix, A
# tile size = mtilek * mtilem
mlate8.m md, (rs1), rs2 # 8-bit left tile load
mlate16.m md, (rs1), rs2 # 16-bit left tile load
mlate32.m md, (rs1), rs2 # 32-bit left tile load
mlate64.m md, (rs1), rs2 # 64-bit left tile load
# For right matrix, B
# tile size = mtilen * mtilek
mlbte8.m md, (rs1), rs2 # 8-bit right tile load
mlbte16.m md, (rs1), rs2 # 16-bit right tile load
mlbte32.m md, (rs1), rs2 # 32-bit right tile load
mlbte64.m md, (rs1), rs2 # 64-bit right tile load
# For output matrix, C
# tile size = mtilen * mtilem
mlcte8.m md, (rs1), rs2 # 8-bit output tile load
mlcte16.m md, (rs1), rs2 # 16-bit output tile load
mlcte32.m md, (rs1), rs2 # 32-bit output tile load
mlcte64.m md, (rs1), rs2 # 64-bit output tile load
Store a matrix tile to memory.
# ms3 store data, rs1 base address, rs2 row byte stride
# For left matrix, A
# tile size = mtilem * mtilek
msae8.m ms3, (rs1), rs2 # 8-bit left tile store
msae16.m ms3, (rs1), rs2 # 16-bit left tile store
msae32.m ms3, (rs1), rs2 # 32-bit left tile store
msae64.m ms3, (rs1), rs2 # 64-bit left tile store
# For right matrix, B
# tile size = mtilek * mtilen
msbe8.m ms3, (rs1), rs2 # 8-bit right tile store
msbe16.m ms3, (rs1), rs2 # 16-bit right tile store
msbe32.m ms3, (rs1), rs2 # 32-bit right tile store
msbe64.m ms3, (rs1), rs2 # 64-bit right tile store
# For output matrix, C
# tile size = mtilem * mtilen
msce8.m ms3, (rs1), rs2 # 8-bit output tile store
msce16.m ms3, (rs1), rs2 # 16-bit output tile store
msce32.m ms3, (rs1), rs2 # 32-bit output tile store
msce64.m ms3, (rs1), rs2 # 64-bit output tile store
Save a matrix tile to memory, where the matrix on memory is transposed.
# ms3 store data, rs1 base address, rs2 row byte stride
# For left matrix, A
# tile size = mtilek * mtilem
msate8.m ms3, (rs1), rs2 # 8-bit left tile store
msate16.m ms3, (rs1), rs2 # 16-bit left tile store
msate32.m ms3, (rs1), rs2 # 32-bit left tile store
msate64.m ms3, (rs1), rs2 # 64-bit left tile store
# For right matrix, B
# tile size = mtilen * mtilek
msbte8.m ms3, (rs1), rs2 # 8-bit right tile store
msbte16.m ms3, (rs1), rs2 # 16-bit right tile store
msbte32.m ms3, (rs1), rs2 # 32-bit right tile store
msbte64.m ms3, (rs1), rs2 # 64-bit right tile store
# For output matrix, C
# tile size = mtilen * mtilem
mscte8.m ms3, (rs1), rs2 # 8-bit output tile store
mscte16.m ms3, (rs1), rs2 # 16-bit output tile store
mscte32.m ms3, (rs1), rs2 # 32-bit output tile store
mscte64.m ms3, (rs1), rs2 # 64-bit output tile store
Load a whole tile matrix from memory without considering the size.
mltre8.m md, (rs1), rs2 # 8-bit whole matrix load
mltre16.m md, (rs1), rs2 # 16-bit whole matrix load
mltre32.m md, (rs1), rs2 # 32-bit whole matrix load
mltre64.m md, (rs1), rs2 # 64-bit whole matrix load
Load a whole accumulation matrix from memory without considering the size.
mlacce8.m md, (rs1), rs2 # 8-bit whole matrix load
mlacce16.m md, (rs1), rs2 # 16-bit whole matrix load
mlacce32.m md, (rs1), rs2 # 32-bit whole matrix load
mlacce64.m md, (rs1), rs2 # 64-bit whole matrix load
Store a whole tile matrix to memory without considering the size.
mstre8.m ms3, (rs1), rs2 # 8-bit whole matrix store
mstre16.m ms3, (rs1), rs2 # 16-bit whole matrix store
mstre32.m ms3, (rs1), rs2 # 32-bit whole matrix store
mstre64.m ms3, (rs1), rs2 # 64-bit whole matrix store
Store a whole accumulation matrix to memory without considering the size.
msacce8.m ms3, (rs1), rs2 # 8-bit whole matrix store
msacce16.m ms3, (rs1), rs2 # 16-bit whole matrix store
msacce32.m ms3, (rs1), rs2 # 32-bit whole matrix store
msacce64.m ms3, (rs1), rs2 # 64-bit whole matrix store
Note
|
Whole matrix load and store instructions are usually used for context saving and restoring. |