A Software Approach to Unifying Multicore Caches
Multicore chips will have large amounts of fast on-chip cache memory, along with relatively slow DRAM interfaces. The on-chip cache memory, however, will be fragmented and spread over the chip; this distributed arrangement is hard for certain kinds of applications to exploit efficiently, and can lead to needless slow DRAM accesses. First, data accessed from many cores may be duplicated in many caches, reducing the amount of distinct data cached. Second, data in a cache distant from the accessing core may be slow to fetch via the cache coherence protocol. Third, software on each core can only allocate space in the small fraction of total cache memory that is local to that core. A new approach called software cache unification (SCU) addresses these challenges for applications that would be better served by a large shared cache. SCU chooses the on-chip cache in which to cache each item of data. As an application thread reads data items, SCU moves the thread to the core whose on-chip cache contains each item. This allows the thread to read the data quickly if it is already on-chip; if it is not, moving the thread causes the data to be loaded into the chosen on-chip cache. A new file cache for Linux, called MFC, uses SCU to improve performance of file-intensive applications, such as Unix file utilities. An evaluation on a 16-core AMD Opteron machine shows that MFC improves the throughput of file utilities by a factor of 1.6. Experiments with a platform that emulates future machines with less DRAM throughput per core shows that MFC will provide benefit to a growing range of applications.