作者: Aleksandar Zlateski , Nir Shavit , Alexander Matveev
DOI:
关键词:
摘要: A system and method of inferring a neural network (NN) on one or more target computing devices. The NN may include plurality layers, where at least layer includes kernels. Embodiments include: receiving data structure representing the NN; analyzing to produce tasks, each task computations pertaining kernel selecting sparse version replacing with version; compiling tasks respective tensor columns, columns are adapted fit in cache memories devices, instruction code that represents computation NN.