Instructions and logic to perform floating-point and integer operations for machine learning

作者: Tatiana Shpeisman , Balaji Vembu , Eriko Nurvitadhi , Nicolas C. Galoppo Von Borries , Altug Koker

DOI:

关键词:

摘要: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the comprising multiprocessor having single instruction, multiple thread (SIMT) architecture, execute at least one instruction; and first compute included within multiprocessor, instruction cause perform two-dimensional matrix multiply accumulate operation, wherein operation includes 32-bit intermediate product of 16-bit operands sum based on product.

参考文章(61)
James S. Blomgren, Terence M. Potter, Type conversion using floating-point unit ,(2013)
Thomas Glen Dietterich, Adaptive computation and machine learning MIT Press. ,(1998)
Ian Chi Yan Kwong, Colin Sprinkle, David Conrad Tannenbaum, Ming Y. Siu, Srinivasan Iyer, Stuart F. Oberman, Approach for efficient arithmetic operations ,(2012)
Paul N. Loewenstein, Mark A. Luttrell, Paul J. Jordan, Load-monitor mwait ,(2015)
Muthu M. Baskaran, Rajesh J. Bordawekar, Sparse matrix-vector multiplication on graphics processor units ,(2009)