作者: Tatiana Shpeisman , Balaji Vembu , Eriko Nurvitadhi , Nicolas C. Galoppo Von Borries , Altug Koker
DOI:
关键词:
摘要: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the comprising multiprocessor having single instruction, multiple thread (SIMT) architecture, execute at least one instruction; and first compute included within multiprocessor, instruction cause perform two-dimensional matrix multiply accumulate operation, wherein operation includes 32-bit intermediate product of 16-bit operands sum based on product.