A Hierarchical Cache Coherent Protocol

A Hierarchical Cache Coherent Protocol

dc.date.accessioned	2004-10-20T20:29:26Z
dc.date.accessioned	2018-11-24T10:23:02Z
dc.date.available	2004-10-20T20:29:26Z
dc.date.available	2018-11-24T10:23:02Z
dc.date.issued	1992-09-01	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/7088
dc.identifier.uri	http://repository.aust.edu.ng/xmlui/handle/1721.1/7088
dc.description.abstract	As the number of processors in distributed-memory multiprocessors grows, efficiently supporting a shared-memory programming model becomes difficult. We have designed the Protocol for Hierarchical Directories (PHD) to allow shared-memory support for systems containing massive numbers of processors. PHD eliminates bandwidth problems by using a scalable network, decreases hot-spots by not relying on a single point to distribute blocks, and uses a scalable amount of space for its directories. PHD provides a shared-memory model by synthesizing a global shared memory from the local memories of processors. PHD supports sequentially consistent read, write, and test- and-set operations. This thesis also introduces a method of describing locality for hierarchical protocols and employs this method in the derivation of an abstract model of the protocol behavior. An embedded model, based on the work of Johnson[ISCA19], describes the protocol behavior when mapped to a k-ary n-cube. The thesis uses these two models to study the average height in the hierarchy that operations reach, the longest path messages travel, the number of messages that operations generate, the inter-transaction issue time, and the protocol overhead for different locality parameters, degrees of multithreading, and machine sizes. We determine that multithreading is only useful for approximately two to four threads; any additional interleaving does not decrease the overall latency. For small machines and high locality applications, this limitation is due mainly to the length of the running threads. For large machines with medium to low locality, this limitation is due mainly to the protocol overhead being too large. Our study using the embedded model shows that in situations where the run length between references to shared memory is at least an order of magnitude longer than the time to process a single state transition in the protocol, applications exhibit good performance. If separate controllers for processing protocol requests are included, the protocol scales to 32k processor machines as long as the application exhibits hierarchical locality: at least 22% of the global references must be able to be satisfied locally; at most 35% of the global references are allowed to reach the top level of the hierarchy.	en_US
dc.format.extent	3979950 bytes
dc.format.extent	3395110 bytes
dc.language.iso	en_US
dc.title	A Hierarchical Cache Coherent Protocol	en_US

Files in this item

Files	Size	Format	View
AITR-1645.pdf	3.395Mb	application/pdf	View/Open
AITR-1645.ps	3.979Mb	application/postscript	View/Open

This item appears in the following Collection(s)

Computer Science and Artificial Intelligence Lab (CSAIL)2625

Show simple item record

A Hierarchical Cache Coherent Protocol

Files in this item

This item appears in the following Collection(s)

Computer Science and Artificial Intelligence Lab (CSAIL)2625