作者: Girish Venkataramani , Seth Copen Goldstein
DOI:
关键词:
摘要: This dissertation presents a System-Level Timing Analysis (SLTA) methodology and micro-architectural optimization framework for use within hardware compilation. As the EDA abstraction layer of preference is raised to Electronic System Level (ESL), focus on describing systems using Transaction Modeling (TLM) [CG03, Pas02, Ede06], which amenable high-level synthesis. The proposed SLTA ESL designed complement TLM-based synthesis flows by analyzing sequential dependency behavior system-level transactions. Using this knowledge, control-path-altering, micro-architecture optimizations are applied iteratively well-defined Intermediate Representation (IR). There two over-arching contributions in dissertation. First, we describe an (IR) as valuable addition infrastructure compiler. IR captures data/control dependencies source program well resource underlying circuit architecture. transaction events TLM but also linked RTL control-path signals that implement specification communication protocols. By properties IR, set timing entities produced characterize performance. goal these system's execution attributes. defined cycle time [Bur91, NK94, IP95, Das04], or initiation interval [RG81, Lairn88], specifies between successive iterations execution. Instead representing solitary number (cycle time), propose fine-grained building blocks various aspects timing. primary block slack, difference firing given event when used downstream. define Global Critical Path. (GCP) system longest path zero slack (or critical) events. GCP, essence, traces directly contribute system-wide time. A third entity, global derivative both how early before it GCP All three recorded annotations enables compiler easily make value judgments regarding costs benefits local transformations. Second, built top IR. support apply IR-to-IR We fast update function can re-compute changes transformation applied. development algorithms design exploration tools scan space applying series quality-enhancing main benefit approach separates from optimizations. Thus, evaluate different architectures committing one will be synthesized. Since excellent indicators performance, they help focusing and/or effort toward sub-systems most critical performance non-critical if objective power/area minimization). key ingredient makes practical scalable ability efficiently response structural introduced linear-time forms "glue" allows re-use without having re-analyze The following claims: (1) slack sufficiently accurately model representation time; (2) computing after linear size design; (3) several existing re-formulated methodology, resulting efficient, quadratic-time, heuristic solve hard problems. We present proof concept embedding ESL, CASH (Compiler Application Specific Hardware), synthesizes asynchronous implementations C programs. Three pipeline were improve energy efficiency: matching, operation chaining hybrid latch Experimental results optimize several media processing kernels transformations reveal average energy-delay energy-delay-area products about 1.44x 2x respectively, with peak improvements 5.3x 18.5x respectively. Further, algorithm instead complete re-analysis reduces total loop runtime hours down few seconds duality degradation less than 1% terms energy-delay, area