The matrix extension adds 8 architectural Tile Registers (tr0-tr7) for input tile matrices and 8 architectural Accumulation Registers (acc0-acc7) for output accumulation matrices.
A Tile Register has a fixed MLEN bits of state, where each row has RLEN bits. As a result, there are MLEN/RLEN rows for each tile register in logic.
An Accumulation Register has a fixed \$\rm{MLEN} \times \rm{AMUL}\$ bits of state, where each row has \$\rm{RLEN} \times \rm{AMUL}\$ bits. As a result, there are MLEN/RLEN rows for each accumulation register in logic.
An input matrix of matrix multiplication instruction only uses one tile register, and large matrix must be split according to the size of tile defined by MLEN and RLEN.
For widening instructions, each output element is wider than input one. To match the width of input and output, an output matrix may be written back to a wider accumulation register whose length are specified by MLEN x AMUL.