TY - GEN
T1 - Cache-efficient dynamic programming algorithms for multicores
AU - Chowdhury, Rezaul Alam
AU - Ramachandran, Vijaya
PY - 2008
Y1 - 2008
N2 - We present cache-efficient chip multiprocessor (CMP) algorithms with good speed-up for some widely used dynamic programming algorithms. We consider three types of caching systems for CMPs: D-CMP with a private cache for each core, S-CMP with a single cache shared by all cores, and Multicore, which has private L1 caches and a shared L2 cache. We derive results for three classes of problems: local dependency dynamic programming (LDDP), Gaussian Elimination Paradigm (GEP), and parenthesis problem. For each class of problems, we develop a generic CMP algorithm with an associated tiling sequence. We then tailor this tiling sequence to each caching model and provide a parallel schedule that results in a cache-efficient parallel execution up to the critical path length of the underlying dynamic programming algorithm. We present experimental results on an 8-core Opteron for two sequence alignment problems that are important examples of LDDP. Our experimental results show good speed-ups for simple versions of our algorithms.
AB - We present cache-efficient chip multiprocessor (CMP) algorithms with good speed-up for some widely used dynamic programming algorithms. We consider three types of caching systems for CMPs: D-CMP with a private cache for each core, S-CMP with a single cache shared by all cores, and Multicore, which has private L1 caches and a shared L2 cache. We derive results for three classes of problems: local dependency dynamic programming (LDDP), Gaussian Elimination Paradigm (GEP), and parenthesis problem. For each class of problems, we develop a generic CMP algorithm with an associated tiling sequence. We then tailor this tiling sequence to each caching model and provide a parallel schedule that results in a cache-efficient parallel execution up to the critical path length of the underlying dynamic programming algorithm. We present experimental results on an 8-core Opteron for two sequence alignment problems that are important examples of LDDP. Our experimental results show good speed-ups for simple versions of our algorithms.
KW - Cache-efficiency
KW - Distributed cache
KW - Multicore
KW - Parallelism
KW - Shared cache
UR - https://www.scopus.com/pages/publications/57349161938
U2 - 10.1145/1378533.1378574
DO - 10.1145/1378533.1378574
M3 - Conference contribution
AN - SCOPUS:57349161938
SN - 9781595939739
T3 - Annual ACM Symposium on Parallelism in Algorithms and Architectures
SP - 207
EP - 216
BT - SPAA'08 - Proceedings of the 20th Annual Symposium on Parallelism in Algorithms and Architectures
T2 - 20th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA'08
Y2 - 14 June 2008 through 16 June 2008
ER -