Exploiting concurrency in a general-purpose one-instruction computer architecture
Computer performance is suffering greatly from diminishing returns as the
increasing cost of implementing complex hardware optimizations and of increasing
clock frequency no longer yields the gains in computational ability and power
efficiency consumers demand. Notable products including a generation of Intel
Pentium 4 processors have been cancelled as a result. This sudden hiccup in an
historically predictable performance road map has inspired research and industrial
communities to investigate architectures, some rather unorthodox, that complete
work more quickly and more efficiently.
One such computer architecture under development, Fleet, exposes fine-grain
instruction level concurrency, addresses the growing costs of on-chip communications,
and promotes simplicity in the underlying hardware design. This
one-instruction computer transports data using simple move operations. The
globally-asynchronous architecture promotes high modularity allowing specialized
configurations of the architecture to be generated quickly with low hardware
and software complexity.
The Armada architecture presented in this thesis expands on Fleet by introducing
constructs that exploit thread-level concurrency. The proposals herein
aim to increase the performance efficiency of Fleet and other communicationcentric
architectures. Trade-offs between software and hardware complexity and
between the static and dynamic division of labor are investigated through the
implementation and study of an Armada microarchitecture and an Armada compiler
created for this research. This thesis explores the merits and pitfalls of this
unique architecture as the basis for general-purpose computers for the future.