Patent No. | 10,599,429 |
---|---|

Issue Date | March 24, 2020 |

Title | Variable Format, Variable Sparsity Matrix Multiplication Instruction |

Inventorship | Mark A. Anders, Hillsboro, OR (US) Himanshu Kaul, Portland, OR (US) Sanu Mathew, Portland, OR (US) |

Assignee | Intel Corporation, Santa Clara, CA (US) |

1. A processor comprising:fetch and decode circuitry to fetch and decode a variable format, variable sparsity matrix multiplication (VFVSMM) instruction having fields to specify locations where each of A, B, and C matrices having (M×K), (K×N), and (M×N) elements, respectively, is stored; and

execution circuitry, operating in a dense-dense mode, in response to the decoded VFVSMM instruction, to route each row of the specified A matrix, staggering subsequent rows, into a corresponding row of a processing array having (M×N) processing units, and route each column of the specified B matrix, staggering subsequent columns, into a corresponding column of the processing array, and

wherein each of the (M×N) processing units is to generate K products of matching A-matrix and B-matrix elements received from the specified A and B matrices, respectively, a match to exist when the B-Matrix element has the same row address as a column address of the A-matrix element, and to accumulate each generated product with a corresponding element of the specified C-matrix having a same relative position as a position of the processing unit in the processing array.

execution circuitry, operating in a dense-dense mode, in response to the decoded VFVSMM instruction, to route each row of the specified A matrix, staggering subsequent rows, into a corresponding row of a processing array having (M×N) processing units, and route each column of the specified B matrix, staggering subsequent columns, into a corresponding column of the processing array, and

wherein each of the (M×N) processing units is to generate K products of matching A-matrix and B-matrix elements received from the specified A and B matrices, respectively, a match to exist when the B-Matrix element has the same row address as a column address of the A-matrix element, and to accumulate each generated product with a corresponding element of the specified C-matrix having a same relative position as a position of the processing unit in the processing array.