Adaptive Cache Compression for High-Performance Processors

作者: Alaa R. Alameldeen , David A. Wood

DOI: 10.1145/1028176.1006719

关键词: Cache invalidationCache coloringPage cacheCache-oblivious algorithmComputer scienceCache pollutionParallel computingMemory architectureCPU cacheCacheSmart CacheWrite-onceCache algorithms

摘要: Modern processors use two or more levels ofcache memories to bridge the rising disparity betweenprocessor and memory speeds. Compression canimprove cache performance by increasing effectivecache capacity eliminating misses. However,decompressing lines also increases accesslatency, potentially degrading performance.In this paper, we develop an adaptive policy thatdynamically adapts costs benefits of cachecompression. We propose a two-level hierarchywhere L1 holds uncompressed data L2cache dynamically selects between compressed anduncompressed storage. The L2 is 8-way set-associativewith LRU replacement, where each set can storeup eight but has space for only fouruncompressed lines. On reference, LRUstack depth size determine whethercompression (could have) eliminated miss incurs anunnecessary decompression overhead. Based on thisoutcome, updates single globalsaturating counter, which predicts whether allocatelines in form.We evaluate compression usingfull-system simulation range benchmarks. Weshow that improve formemory-intensive commercial workloads up 17%.However, always using hurts performancefor low-miss-rate benchmarks-due unnecessarydecompression overhead-degrading byup 18%. By monitoring workload behavior,the achieves comparable benefitsfrom compression, while never performanceby than 0.4%.

参考文章(44)
G. Hinton, The microarchitecture of the Pentium 4 processor Intel Technical Journal. ,vol. 1, ,(2001)
Yannis Smaragdakis, Scott F. Kaplan, Paul R. Wilson, The case for compressed caching in virtual memory systems usenix annual technical conference. pp. 8- 8 ,(1999)
J.-S. Lee, W.-K. Hong, S.-D. Kim, Adaptive Methods to Minimize Decompression Overhead for Compressed On-Chip Caches International Journal of Computers and Applications. ,vol. 25, pp. 98- 105 ,(2003) , 10.1080/1206212X.2003.11441690
David Chen, Enoch Peserico, Larry Rudolph, None, A Dynamically Partitionable Compressed Cache Singapore-MIT Alliance Symposium 2003. ,(2003)
S. Jourdan, Tse-Hao Hsing, J. Stark, Y.N. Patt, The effects of mispredicted-path execution on branch prediction structures international conference on parallel architectures and compilation techniques. pp. 58- 67 ,(1996) , 10.1109/PACT.1996.552555
L. Benini, D. Bruni, B. Ricco, A. Macii, E. Macii, An adaptive data compression scheme for memory traffic minimization in processor-based systems international symposium on circuits and systems. ,vol. 4, pp. 866- 869 ,(2002) , 10.1109/ISCAS.2002.1010595
James H. Pomerene, Frank J. Sparacio, Rudolph N. Rechtschaffen, Thomas R. Puzak, Prefetching system for a cache having a second directory for sequentially accessed blocks ,(1984)
R. Schaller, Technological innovation in the semiconductor industry: A case study of the International Technology Roadmap for Semiconductors (ITRS) portland international conference on management of engineering and technology. ,vol. 1, pp. 195- ,(2001) , 10.1109/PICMET.2001.951917
Erik G. Hallnor, Steven K. Reinhardt, A compressed memory hierarchy using an indirect index cache Proceedings of the 3rd workshop on Memory performance issues in conjunction with the 31st international symposium on computer architecture - WMPI '04. pp. 9- 15 ,(2004) , 10.1145/1054943.1054945