Exploiting Vector Parallelism in Software Pipelined Loops

Exploiting Vector Parallelism in Software Pipelined Loops

dc.date.accessioned	2005-12-19T23:46:12Z
dc.date.accessioned	2018-11-24T10:23:54Z
dc.date.available	2005-12-19T23:46:12Z
dc.date.available	2018-11-24T10:23:54Z
dc.date.issued	2005-06-03
dc.identifier.uri	http://hdl.handle.net/1721.1/30423
dc.identifier.uri	http://repository.aust.edu.ng/xmlui/handle/1721.1/30423
dc.description.abstract	An emerging trend in processor design is the incorporation of short vector instructions into the ISA. In fact, vector extensions have appeared in most general-purpose microprocessors. To utilize these instructions, traditional vectorization technology can be used to identify and exploit data parallelism. In contrast, efficient use of a processor\'s scalar resources is typically achieved through ILP techniques such as software pipelining. In order to attain the best performance, it is necessary to utilize both sets of resources. This paper presents a novel approach for exploiting vector parallelism in a software pipelined loop. At its core is a method for judiciously partitioning operations between vector and scalar resources. The proposed algorithm (i) lowers the burden on the scalar resources by offloading computation to the vector functional units, and (ii) partially (or fully) inhibits the optimizations when full vectorization will decrease performance. ! This results in better resource usage and allows for software pipelining with shorter initiation intervals. Although our techniques complement statically scheduled machines most naturally, we believe they are applicable to any architecture that tightly integrates support for ILP and data parallelism.An important aspect of the proposed methodology is its ability to manage explicit communication of operands between vector and scalar instructions. Our methodology also allows for a natural handling of misaligned vector memory operations. For architectures that provide hardware support for misaligned references, software pipelining effectively hides the latency of these potentially expensive instructions. When explicit alignment is required in software, our algorithm accounts for these extra costs and vectorizes only when it is profitable. Finally, our heuristic can take advantage of alignment information where it is available.We evaluate our methodology using several DSP and SPEC FP benchmarks. Compared to software pipelining, our approach is able to achieve an average speedup of 1.30x and 1.18x for the two benchmark sets, respectively.
dc.format.extent	14 p.
dc.format.extent	19708112 bytes
dc.format.extent	690985 bytes
dc.language.iso	en_US
dc.title	Exploiting Vector Parallelism in Software Pipelined Loops

Files in this item

Files	Size	Format	View
MIT-CSAIL-TR-2005-039.pdf	690.9Kb	application/pdf	View/Open
MIT-CSAIL-TR-2005-039.ps	19.70Mb	application/postscript	View/Open

This item appears in the following Collection(s)

Computer Science and Artificial Intelligence Lab (CSAIL)2625

Show simple item record

Exploiting Vector Parallelism in Software Pipelined Loops

Files in this item

This item appears in the following Collection(s)

Computer Science and Artificial Intelligence Lab (CSAIL)2625