作者: William W. Macy , Eric Q. Li , Yen-Kuang Chen , Minerva M. Yeung
DOI:
关键词:
摘要: A method and apparatus for performing matrix transformations including multiply-add operations byte shuffle on packed data in a processor. In one embodiment, two rows of content elements are shuffled to generate first second respectively columns columns. third sums products is generated from the by instruction. fourth more another Corresponding then summed product matrix. Elements may be an order that further facilitates multiplication.