作者: Penny Li , Jinuk Luke Shin , Georgios Konstadinidis , Francis Schumacher , Venkat Krishnaswamy
DOI: 10.1109/ISSCC.2015.7062931
关键词: Write-once 、 Computer science 、 Parallel computing 、 Snoopy cache 、 Cache 、 Cache pollution 、 Virtual address space 、 CPU cache 、 Cache coloring 、 Pipeline burst cache 、 Cache invalidation 、 Smart Cache 、 Page cache 、 Cache algorithms 、 Operating system 、 Cache-oblivious algorithm 、 Bus sniffing 、 MESI protocol 、 MESIF protocol
摘要: The SPARC M7 processor delivers more than 3x throughput performance improvement over its predecessor M6 for commercial applications. It introduces new design features, such as the S4 core, a 64MB L3 cache subsystem with application data integrity, low-latency, high-throughput on-chip network (OCN), database analytic accelerator (DAX), fine-grain adaptive power management and 1.5× higher SerDes I/O bandwidth memory, coherency system interfaces (Fig. 4.2.1) [1]. enhancements in core S3 [2] include L2 scheme, support visual instruction set (VIS) extensions, virtual address masking user-level synchronization instructions to provide continuous single-thread processors since T4. In addition, hierarchical modular approach, called cluster (SCC), is used core-L2-L3 system. Within SCC, all four cores share single 256KB each pair has own cache. caches are organized 2-banks 8-ways deliver greater 1TB/s cores. This 2× 1.5x increase size same latency previous generation scheme. connect an 8MB, 8-way set-associative partitioned Having localized within SCC reduces by 25%. chip contains eight SCCs total of 32-cores 256 threads 1.6TB/S bandwidth. order requirements from other agents, OCN architecture implemented place crossbar based processors. Each connects OCN, which turn memory controllers (MCUs), systems (DAX) engines. customized DAX engine effort optimize Oracle databases. Eight engines handle simple query predicates, decompression, message passing interrupts across nodes. provides up 10x better stream decompression.