Scalable locality-conscious multithreaded memory allocation

作者: Scott Schneider , Christos D. Antonopoulos , Dimitrios S. Nikolopoulos

DOI: 10.1145/1133956.1133968

关键词:

摘要: We present Streamflow, a new multithreaded memory manager designed for low overhead, high-performance allocation while transparently favoring locality. Streamflow enables over-head simultaneous by multiple threads and adapts to sequential at speeds comparable that of custom allocators. It favors the transparent exploitation temporal spatial object access locality, reduces allocator-induced cache conflicts false sharing, all using unified design based on segregated heaps. introduces an innovative which uses only synchronization-free operations in most common case local allocations deallocations, requiring minimal, non-blocking synchronization less remote deallocations. Spatial locality page level is favoredby eliminating small objects headers, reducing via contiguous blocks physical memory, sharing heaps achieving better TLB performance fewer faults use superpages. Combining these optimizations with drastic reduction latency overhead allows perform comparably optimized allocators outperform--on shared-memory systemwith four two-way SMT processors--four state-of-the-art multi-processor sizeable margins our experiments. The allocation-intensive parallel benchmarks used experiments represent variety behaviors, including mostly allocation-deallocation patterns producer-consumer patterns.

参考文章(25)
Poul-Henning Kamp, Malloc(3) revisited usenix annual technical conference. pp. 36- 36 ,(1998)
Peter Druschel, Juan E. Navarro, Transparent operating system support for superpages Rice University. ,(2004)
Voon-Yee Vee, Wen-Jing Hsu, A scalable and efficient storage allocator on shared-memory multiprocessors international symposium on parallel architectures algorithms and networks. pp. 230- 235 ,(1999) , 10.1109/ISPAN.1999.778944
Paul R Wilson, Mark S Johnstone, Michael Neely, David Boles, None, Dynamic Storage Allocation: A Survey and Critical Review international symposium on memory management. pp. 1- 116 ,(1995) , 10.1007/3-540-60368-9_19
N. S. Arora, R. D. Blumofe, C. G. Plaxton, Thread Scheduling for Multiprogrammed Multiprocessors Theory of Computing Systems \/ Mathematical Systems Theory. ,vol. 34, pp. 115- 144 ,(2001) , 10.1007/S00224-001-0004-Z
Dirk Grunwald, Benjamin Zorn, Robert Henderson, Improving the cache locality of memory allocation Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation - PLDI '93. ,vol. 28, pp. 177- 186 ,(1993) , 10.1145/155090.155107
Maged M. Michael, Scalable lock-free dynamic memory allocation Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation - PLDI '04. ,vol. 39, pp. 35- 46 ,(2004) , 10.1145/996841.996848
KIEM-PHONG VO, Vmalloc: A General and Efficient Memory Allocator Software - Practice and Experience. ,vol. 26, pp. 357- 374 ,(1996) , 10.1002/(SICI)1097-024X(199603)26:3<357::AID-SPE15>3.0.CO;2-#
Kenneth C. Knowlton, A fast storage allocator Communications of the ACM. ,vol. 8, pp. 623- 624 ,(1965) , 10.1145/365628.365655