TY - GEN
T1 - Provably good multicore cache performance for divide-and-conquer algorithms
AU - Blelloch, Guy E.
AU - Chowdhury, Rezaul A.
AU - Gibbons, Phillip B.
AU - Ramachandran, Vijaya
AU - Chen, Shimin
AU - Kozuch, Michael
PY - 2008
Y1 - 2008
N2 - This paper presents a multicore-cache model that reflects the reality that multicore processors have both per-processor private (L 1) caches and a large shared (L 2) cache on chip. We consider a broad class of parallel divide-andconquer algorithms and present a new on-line scheduler, CONTROLLED-PDF, that is competitive with the standard sequential scheduler in the following sense. Given any dynamically unfolding computation DAG from this class of algorithms, the cache complexity on the multicore-cache model under our new scheduler is within a constant factor of the sequential cache complexity for both L 1 and L 2, while the time complexity is within a constant factor of the sequential time complexity divided by the number of processors p. These are the first such asymptoticallyoptimal results for any multicore model. Finally, we show that a separator-based algorithm for sparse-matrix-densevector-multiply achieves provably good cache performance in the multicore-cache model, as well as in the well-studied sequential cache-oblivious model.
AB - This paper presents a multicore-cache model that reflects the reality that multicore processors have both per-processor private (L 1) caches and a large shared (L 2) cache on chip. We consider a broad class of parallel divide-andconquer algorithms and present a new on-line scheduler, CONTROLLED-PDF, that is competitive with the standard sequential scheduler in the following sense. Given any dynamically unfolding computation DAG from this class of algorithms, the cache complexity on the multicore-cache model under our new scheduler is within a constant factor of the sequential cache complexity for both L 1 and L 2, while the time complexity is within a constant factor of the sequential time complexity divided by the number of processors p. These are the first such asymptoticallyoptimal results for any multicore model. Finally, we show that a separator-based algorithm for sparse-matrix-densevector-multiply achieves provably good cache performance in the multicore-cache model, as well as in the well-studied sequential cache-oblivious model.
UR - https://www.scopus.com/pages/publications/58449090994
M3 - Conference contribution
AN - SCOPUS:58449090994
SN - 9780898716474
T3 - Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms
SP - 501
EP - 510
BT - Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms
T2 - 19th Annual ACM-SIAM Symposium on Discrete Algorithms
Y2 - 20 January 2008 through 22 January 2008
ER -