摘要: Clustered microarchitectures are an effective approach to reducing the penalties caused by wire delays inside a chip. Current superscalar processors have in fact two-cluster microarchitecture with naive code partitioning approach: integer instructions allocated one cluster and floating-point other. This scheme is simple results no communications between two clusters (just through memory) but it general far from optimal because she workload not evenly distributed most of time. In fact, when processor running programs, extremely unbalanced since FP used at all. this work we investigate run-time mechanisms that dynamically distribute program among these clusters. By optimizing trade-off inter-cluster communication penalty balance, proposed schemes can achieve average speed-up 36% for SpecInt95 benchmark suite.