System-level timing analysis and optimizations for hardware compilation

作者: Girish Venkataramani , Seth Copen Goldstein

DOI:

关键词:

摘要: This dissertation presents a System-Level Timing Analysis (SLTA) methodology and micro-architectural optimization framework for use within hardware compilation. As the EDA abstraction layer of preference is raised to Electronic System Level (ESL), focus on describing systems using Transaction Modeling (TLM) [CG03, Pas02, Ede06], which amenable high-level synthesis. The proposed SLTA ESL designed complement TLM-based synthesis flows by analyzing sequential dependency behavior system-level transactions. Using this knowledge, control-path-altering, micro-architecture optimizations are applied iteratively well-defined Intermediate Representation (IR). There two over-arching contributions in dissertation. First, we describe an (IR) as valuable addition infrastructure compiler. IR captures data/control dependencies source program well resource underlying circuit architecture. transaction events TLM but also linked RTL control-path signals that implement specification communication protocols. By properties IR, set timing entities produced characterize performance. goal these system's execution attributes. defined cycle time [Bur91, NK94, IP95, Das04], or initiation interval [RG81, Lairn88], specifies between successive iterations execution. Instead representing solitary number (cycle time), propose fine-grained building blocks various aspects timing. primary block slack, difference firing given event when used downstream. define Global Critical Path. (GCP) system longest path zero slack (or critical) events. GCP, essence, traces directly contribute system-wide time. A third entity, global derivative both how early before it GCP All three recorded annotations enables compiler easily make value judgments regarding costs benefits local transformations. Second, built top IR. support apply IR-to-IR We fast update function can re-compute changes transformation applied. development algorithms design exploration tools scan space applying series quality-enhancing main benefit approach separates from optimizations. Thus, evaluate different architectures committing one will be synthesized. Since excellent indicators performance, they help focusing and/or effort toward sub-systems most critical performance non-critical if objective power/area minimization). key ingredient makes practical scalable ability efficiently response structural introduced linear-time forms "glue" allows re-use without having re-analyze The following claims: (1) slack sufficiently accurately model representation time; (2) computing after linear size design; (3) several existing re-formulated methodology, resulting efficient, quadratic-time, heuristic solve hard problems. We present proof concept embedding ESL, CASH (Compiler Application Specific Hardware), synthesizes asynchronous implementations C programs. Three pipeline were improve energy efficiency: matching, operation chaining hybrid latch Experimental results optimize several media processing kernels transformations reveal average energy-delay energy-delay-area products about 1.44x 2x respectively, with peak improvements 5.3x 18.5x respectively. Further, algorithm instead complete re-analysis reduces total loop runtime hours down few seconds duality degradation less than 1% terms energy-delay, area

参考文章(111)
Richard E. Hank, Roger A. Bringmann, William Y. Chen, Scott A. Mahlke, David C. Lin, Effective compiler support for predicated execution using the hyperblock Instruction-level parallel processors. pp. 161- 170 ,(1995)
Andrew M Lines, Pipelined Asynchronous Circuits California Institute of Technology. ,(1998)
Tiberiu Chelcea, Girish Venkataramani, Seth C. Goldstein, Mihai Budiu, Modeling the Global Critical Path in Concurrent Systems ,(2006)
Tiberiu Chelcea, Girish Venkataramani, Seth C. Goldstein, HLS Support for Unconstrained Memory Accesses ,(2005)
Ted Eugene Williams, Self-timed rings and their application to division Stanford University. ,(1992)
Lori Carter, Beth Simon, Brad Calder, Larry Carter, Jeanne Ferrante, Path Analysis and Renaming for Predicated Instruction Scheduling International Journal of Parallel Programming. ,vol. 28, pp. 563- 588 ,(2000) , 10.1023/A:1007512717742
Guang Rong Gao, A pipelined code mapping scheme for static data flow computers Massachusetts Institute of Technology. ,(1986)
Tiberiu Chelcea, Girish Venkataramani, Seth C. Goldstein, Mihai Budiu, C to Asynchronous Dataflow Circuits: An End-to-End Toolflow ,(2004)