Scalar Operand Networks: Design, Implementation, and Analysis
The bypass paths and multiported register files in microprocessors serve as an implicit interconnect tocommunicate operand values among pipeline stages and multiple ALUs. Previous superscalar designs implementedthis interconnect using centralized structures that do not scale with increasing ILP demands. Insearch of scalability, recent microprocessor designs in industry and academia exhibit a trend toward distributedresources such as partitioned register files, banked caches, multiple independent compute pipelines,and even multiple program counters. Some of these partitioned microprocessor designs have begun to implementbypassing and operand transport using point-to-point interconnects. We call interconnects optimizedfor scalar data transport, whether centralized or distributed, scalar operand networks. Although thesenetworks share many of the challenges of multiprocessor networks such as scalability and deadlock avoidance,they have many unique requirements, including ultra-low latencies (a few cycles versus tens of cycles)and ultra-fast operation-operand matching. This paper discusses the unique properties of scalar operandnetworks (SONs), examines alternative ways of implementing them, and introduces the AsTrO taxonomy todistinguish between them. It discusses the design of two alternative networks in the context of the Raw microprocessor,and presents detailed timing, area and energy statistics for a real implementation. The paperalso presents a 5-tuple performance model for SONs and analyzes their performance sensitivity to networkproperties for ILP workloads.