作者: Roberto Bisiani , Andreas Georg Nowatzyk
DOI:
关键词:
摘要: The system described in this thesis explores the territory between two classical multiprocessor families: shared memory and message passing machines. Like systems, proposed architecture presents user a logically uniform address space by all processors. This programming model is supported directly dedicated communication hardware that translating references into messages are exchanged over network of point to channels. key parts work its integration with contemporary processor components form homogeneous, general purpose multiprocessor. The based on an adaptive routing heuristic independent actual topology. High priority was given optimal use physical bandwidth even under heavy or saturated load conditions. can be extended small incremental upgrades supports medium haul channels link more clusters together transparent fashion. Integration model. avoids overhead explicitly sending receiving but introduces problem maintaining consistent state. Memory coherence achieved through notion time. A wide clock sufficient precision sequentialize concurrent access maintained hardware. As measure avoid unnecessary synchronizations, relaxed allow transient inconsistencies. Application code resort strongly coherent at expense higher latency. The primary tool for assessing performance simulator execute application programs target system. Nonintrusive instrumentation provided down individual cycles. trace-based visualization aided both debugging benchmarks.