TY - GEN
T1 - Low-span parallel algorithms for the binary-forking model
AU - Ahmad, Zafar
AU - Chowdhury, Rezaul
AU - Das, Rathish
AU - Ganapathi, Pramod
AU - Gregory, Aaron
AU - Javanmard, Mohammad Mahdi
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/7/6
Y1 - 2021/7/6
N2 - The binary-forking model is a parallel computation model, formally defined by Blelloch et al., in which a thread can fork a concurrent child thread, recursively and asynchronously. The model incurs a cost of (łog n) to spawn or synchronize n tasks or threads. The binary-forking model realistically captures the performance of parallel algorithms implemented using modern multithreaded programming languages on multicore shared-memory machines. In contrast, the widely studied theoretical PRAM model does not consider the cost of spawning and synchronizing threads, and as a result, algorithms achieving optimal performance bounds in the PRAM model may not be optimal in the binary-forking model. Often, algorithms need to be redesigned to achieve optimal performance bounds in the binary-forking model and the non-constant synchronization cost makes the task challenging. In this paper, we show that in the binary-forking model we can achieve optimal or near-optimal span with negligible or no asymptotic blowup in work for comparison-based sorting, Strassen's matrix multiplication (MM), and the Fast Fourier Transform (FFT). Our major results are as follows: (1) A randomized comparison-based sorting algorithm with optimal O(łog n) span and O(nłog n) work, both w.h.p. in n. (2) An optimal O(łog n) span algorithm for Strassen's matrix multiplication (MM) with only a łogłog n -factor blow-up in work as well as a near-optimal O(łog n łogłog łog n) span algorithm with no asymptotic blow-up in work. (3) A near-optimal O(łog n łogłogłog n) span Fast Fourier Transform (FFT) algorithm with less than a łog n-factor blow-up in work for all practical values of n (i.e., n łe 10 ^10,000 ).
AB - The binary-forking model is a parallel computation model, formally defined by Blelloch et al., in which a thread can fork a concurrent child thread, recursively and asynchronously. The model incurs a cost of (łog n) to spawn or synchronize n tasks or threads. The binary-forking model realistically captures the performance of parallel algorithms implemented using modern multithreaded programming languages on multicore shared-memory machines. In contrast, the widely studied theoretical PRAM model does not consider the cost of spawning and synchronizing threads, and as a result, algorithms achieving optimal performance bounds in the PRAM model may not be optimal in the binary-forking model. Often, algorithms need to be redesigned to achieve optimal performance bounds in the binary-forking model and the non-constant synchronization cost makes the task challenging. In this paper, we show that in the binary-forking model we can achieve optimal or near-optimal span with negligible or no asymptotic blowup in work for comparison-based sorting, Strassen's matrix multiplication (MM), and the Fast Fourier Transform (FFT). Our major results are as follows: (1) A randomized comparison-based sorting algorithm with optimal O(łog n) span and O(nłog n) work, both w.h.p. in n. (2) An optimal O(łog n) span algorithm for Strassen's matrix multiplication (MM) with only a łogłog n -factor blow-up in work as well as a near-optimal O(łog n łogłog łog n) span algorithm with no asymptotic blow-up in work. (3) A near-optimal O(łog n łogłogłog n) span Fast Fourier Transform (FFT) algorithm with less than a łog n-factor blow-up in work for all practical values of n (i.e., n łe 10 ^10,000 ).
KW - Binary-forking model
KW - Fast fourier transform (fft)
KW - Parallel computation
KW - Sorting
KW - Strassen's matrix multiplication
UR - https://www.scopus.com/pages/publications/85109551607
U2 - 10.1145/3409964.3461802
DO - 10.1145/3409964.3461802
M3 - Conference contribution
AN - SCOPUS:85109551607
T3 - Annual ACM Symposium on Parallelism in Algorithms and Architectures
SP - 22
EP - 34
BT - SPAA 2021 - Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures
PB - Association for Computing Machinery
T2 - 33rd ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2021
Y2 - 6 July 2021 through 8 July 2021
ER -