作者: Thomas M. Conte , Burzin A. Patel , Kishore N. Menezes , J. Stan Cox
DOI: 10.1007/BF03356747
关键词:
摘要: Profile-based optimization can be used for instruction scheduling, loop data preloading, function in-lining, and cache performance enhancement. However, these techniques have not been embraced by software vendors because programs instrumented profiling run significantly slower, an awkward compile-run-recompile sequence is required, a test input suite must collected validated each program. This paper introduces hardware-based that uses traditional branch handling hardware to generate profile information in real time. Techniques are presented both one-level two-level organizations. The approach produces high accuracy with small slowdown execution (0.4%---4.6%). allows program profiled while it used, eliminating the need suite. With contemporary processors driven increasingly compiler support, important high-performance systems.