Skip to content

Latest commit

 

History

History
154 lines (120 loc) · 5.21 KB

load-store.adoc

File metadata and controls

154 lines (120 loc) · 5.21 KB

Load Instructions

Load a matrix tile from memory.

# md destination, rs1 base address, rs2 row byte stride

# For left matrix, A
# tile size = mtilem * mtilek
mlae8.m  md, (rs1), rs2         #  8-bit left tile load
mlae16.m md, (rs1), rs2         # 16-bit left tile load
mlae32.m md, (rs1), rs2         # 32-bit left tile load
mlae64.m md, (rs1), rs2         # 64-bit left tile load

# For right matrix, B
# tile size = mtilek * mtilen
mlbe8.m  md, (rs1), rs2         #  8-bit right tile load
mlbe16.m md, (rs1), rs2         # 16-bit right tile load
mlbe32.m md, (rs1), rs2         # 32-bit right tile load
mlbe64.m md, (rs1), rs2         # 64-bit right tile load

# For output matrix, C
# tile size = mtilem * mtilen
mlce8.m  md, (rs1), rs2         #  8-bit output tile load
mlce16.m md, (rs1), rs2         # 16-bit output tile load
mlce32.m md, (rs1), rs2         # 32-bit output tile load
mlce64.m md, (rs1), rs2         # 64-bit output tile load

Load a matrix tile from memory, where the matrix on memory is transposed.

# md destination, rs1 base address, rs2 row byte stride

# For left matrix, A
# tile size = mtilek * mtilem
mlate8.m  md, (rs1), rs2        #  8-bit left tile load
mlate16.m md, (rs1), rs2        # 16-bit left tile load
mlate32.m md, (rs1), rs2        # 32-bit left tile load
mlate64.m md, (rs1), rs2        # 64-bit left tile load

# For right matrix, B
# tile size = mtilen * mtilek
mlbte8.m  md, (rs1), rs2        #  8-bit right tile load
mlbte16.m md, (rs1), rs2        # 16-bit right tile load
mlbte32.m md, (rs1), rs2        # 32-bit right tile load
mlbte64.m md, (rs1), rs2        # 64-bit right tile load

# For output matrix, C
# tile size = mtilen * mtilem
mlcte8.m  md, (rs1), rs2        #  8-bit output tile load
mlcte16.m md, (rs1), rs2        # 16-bit output tile load
mlcte32.m md, (rs1), rs2        # 32-bit output tile load
mlcte64.m md, (rs1), rs2        # 64-bit output tile load

Store Instructions

Store a matrix tile to memory.

# ms3 store data, rs1 base address, rs2 row byte stride

# For left matrix, A
# tile size = mtilem * mtilek
msae8.m  ms3, (rs1), rs2        #  8-bit left tile store
msae16.m ms3, (rs1), rs2        # 16-bit left tile store
msae32.m ms3, (rs1), rs2        # 32-bit left tile store
msae64.m ms3, (rs1), rs2        # 64-bit left tile store

# For right matrix, B
# tile size = mtilek * mtilen
msbe8.m  ms3, (rs1), rs2        #  8-bit right tile store
msbe16.m ms3, (rs1), rs2        # 16-bit right tile store
msbe32.m ms3, (rs1), rs2        # 32-bit right tile store
msbe64.m ms3, (rs1), rs2        # 64-bit right tile store

# For output matrix, C
# tile size = mtilem * mtilen
msce8.m  ms3, (rs1), rs2        #  8-bit output tile store
msce16.m ms3, (rs1), rs2        # 16-bit output tile store
msce32.m ms3, (rs1), rs2        # 32-bit output tile store
msce64.m ms3, (rs1), rs2        # 64-bit output tile store

Save a matrix tile to memory, where the matrix on memory is transposed.

# ms3 store data, rs1 base address, rs2 row byte stride

# For left matrix, A
# tile size = mtilek * mtilem
msate8.m  ms3, (rs1), rs2       #  8-bit left tile store
msate16.m ms3, (rs1), rs2       # 16-bit left tile store
msate32.m ms3, (rs1), rs2       # 32-bit left tile store
msate64.m ms3, (rs1), rs2       # 64-bit left tile store

# For right matrix, B
# tile size = mtilen * mtilek
msbte8.m  ms3, (rs1), rs2       #  8-bit right tile store
msbte16.m ms3, (rs1), rs2       # 16-bit right tile store
msbte32.m ms3, (rs1), rs2       # 32-bit right tile store
msbte64.m ms3, (rs1), rs2       # 64-bit right tile store

# For output matrix, C
# tile size = mtilen * mtilem
mscte8.m  ms3, (rs1), rs2       #  8-bit output tile store
mscte16.m ms3, (rs1), rs2       # 16-bit output tile store
mscte32.m ms3, (rs1), rs2       # 32-bit output tile store
mscte64.m ms3, (rs1), rs2       # 64-bit output tile store

Whole Matrix Load & Store Instructions

Load a whole tile matrix from memory without considering the size.

mltre8.m   md, (rs1), rs2       #  8-bit whole matrix load
mltre16.m  md, (rs1), rs2       # 16-bit whole matrix load
mltre32.m  md, (rs1), rs2       # 32-bit whole matrix load
mltre64.m  md, (rs1), rs2       # 64-bit whole matrix load

Load a whole accumulation matrix from memory without considering the size.

mlacce8.m  md, (rs1), rs2       #  8-bit whole matrix load
mlacce16.m md, (rs1), rs2       # 16-bit whole matrix load
mlacce32.m md, (rs1), rs2       # 32-bit whole matrix load
mlacce64.m md, (rs1), rs2       # 64-bit whole matrix load

Store a whole tile matrix to memory without considering the size.

mstre8.m   ms3, (rs1), rs2      #  8-bit whole matrix store
mstre16.m  ms3, (rs1), rs2      # 16-bit whole matrix store
mstre32.m  ms3, (rs1), rs2      # 32-bit whole matrix store
mstre64.m  ms3, (rs1), rs2      # 64-bit whole matrix store

Store a whole accumulation matrix to memory without considering the size.

msacce8.m  ms3, (rs1), rs2      #  8-bit whole matrix store
msacce16.m ms3, (rs1), rs2      # 16-bit whole matrix store
msacce32.m ms3, (rs1), rs2      # 32-bit whole matrix store
msacce64.m ms3, (rs1), rs2      # 64-bit whole matrix store
Note
Whole matrix load and store instructions are usually used for context saving and restoring.