4.2 A 20nm 32-Core 64MB L3 cache SPARC M7 processor

作者: Penny Li , Jinuk Luke Shin , Georgios Konstadinidis , Francis Schumacher , Venkat Krishnaswamy

DOI: 10.1109/ISSCC.2015.7062931

关键词: Write-onceComputer scienceParallel computingSnoopy cacheCacheCache pollutionVirtual address spaceCPU cacheCache coloringPipeline burst cacheCache invalidationSmart CachePage cacheCache algorithmsOperating systemCache-oblivious algorithmBus sniffingMESI protocolMESIF protocol

摘要: The SPARC M7 processor delivers more than 3x throughput performance improvement over its predecessor M6 for commercial applications. It introduces new design features, such as the S4 core, a 64MB L3 cache subsystem with application data integrity, low-latency, high-throughput on-chip network (OCN), database analytic accelerator (DAX), fine-grain adaptive power management and 1.5× higher SerDes I/O bandwidth memory, coherency system interfaces (Fig. 4.2.1) [1]. enhancements in core S3 [2] include L2 scheme, support visual instruction set (VIS) extensions, virtual address masking user-level synchronization instructions to provide continuous single-thread processors since T4. In addition, hierarchical modular approach, called cluster (SCC), is used core-L2-L3 system. Within SCC, all four cores share single 256KB each pair has own cache. caches are organized 2-banks 8-ways deliver greater 1TB/s cores. This 2× 1.5x increase size same latency previous generation scheme. connect an 8MB, 8-way set-associative partitioned Having localized within SCC reduces by 25%. chip contains eight SCCs total of 32-cores 256 threads 1.6TB/S bandwidth. order requirements from other agents, OCN architecture implemented place crossbar based processors. Each connects OCN, which turn memory controllers (MCUs), systems (DAX) engines. customized DAX engine effort optimize Oracle databases. Eight engines handle simple query predicates, decompression, message passing interrupts across nodes. provides up 10x better stream decompression.

参考文章(2)
J.L. Shin, K. Tam, D. Huang, B. Petrick, H. Pham, Changku Hwang, Hongping Li, A. Smith, T. Johnson, F. Schumacher, D. Greenhill, A.S. Leon, A. Strong, A 40nm 16-core 128-thread CMT SPARC SoC processor international solid-state circuits conference. pp. 98- 99 ,(2010) , 10.1109/ISSCC.2010.5434030
Stephen Phillips, M7: Next generation SPARC ieee hot chips symposium. pp. 1- 27 ,(2014) , 10.1109/HOTCHIPS.2014.7478832