Show simple item record

Selective Vectorization for Short-Vector Instructions

dc.date.accessioned2009-12-18T19:30:12Z
dc.date.accessioned2018-11-26T22:26:12Z
dc.date.available2009-12-18T19:30:12Z
dc.date.available2018-11-26T22:26:12Z
dc.date.issued2009-12-18
dc.identifier.urihttp://hdl.handle.net/1721.1/50235
dc.identifier.urihttp://repository.aust.edu.ng/xmlui/handle/1721.1/50235
dc.description.abstractMultimedia extensions are nearly ubiquitous in today's general-purpose processors. These extensions consist primarily of a set of short-vector instructions that apply the same opcode to a vector of operands. Vector instructions introduce a data-parallel component to processors that exploit instruction-level parallelism, and present an opportunity for increased performance. In fact, ignoring a processor's vector opcodes can leave a significant portion of the available resources unused. In order for software developers to find short-vector instructions generally useful, however, the compiler must target these extensions with complete transparency and consistent performance. This paper describes selective vectorization, a technique for balancing computation across a processor's scalar and vector units. Current approaches for targeting short-vector instructions directly adopt vectorizing technology first developed for supercomputers. Traditional vectorization, however, can lead to a performance degradation since it fails to account for a processor's scalar resources. We formulate selective vectorization in the context of software pipelining. Our approach creates software pipelines with shorter initiation intervals, and therefore, higher performance. A key aspect of selective vectorization is its ability to manage transfer of operands between vector and scalar instructions. Even when operand transfer is expensive, our technique is sufficiently sophisticated to achieve significant performance gains. We evaluate selective vectorization on a set of SPEC FP benchmarks. On a realistic VLIW processor model, the approach achieves whole-program speedups of up to 1.35x over existing approaches. For individual loops, it provides speedups of up to 1.75x.en_US
dc.format.extent25 p.en_US
dc.rightsCreative Commons Attribution 3.0 Unporteden
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/
dc.subjectSIMDen_US
dc.subjectVectorizationen_US
dc.subjectCompileren_US
dc.titleSelective Vectorization for Short-Vector Instructionsen_US


Files in this item

FilesSizeFormatView
MIT-CSAIL-TR-2009-064.pdf505.9Kbapplication/pdfView/Open

This item appears in the following Collection(s)

Show simple item record

Creative Commons Attribution 3.0 Unported
Except where otherwise noted, this item's license is described as Creative Commons Attribution 3.0 Unported