TY - GEN
T1 - A Computational Model for Tensor Core Units
AU - Chowdhury, Rezaul
AU - Silvestri, Francesco
AU - Vella, Flavio
N1 - Publisher Copyright:
© 2020 Owner/Author.
PY - 2020/7/6
Y1 - 2020/7/6
N2 - To respond to the need for efficient training and inference of deep neural networks, a plethora of domain-specific architectures have been introduced, such as Google Tensor Processing Units and NVIDIA Tensor Cores. A common feature of these architectures is the design for efficiently computing a dense matrix product of a given small size. In order to broaden the class of algorithms that exploit these systems, we propose a computational model, named the TCU model, that captures the ability to natively multiply small matrices. We then use the TCU model for designing fast algorithms for several problems, including dense and sparse matrix multiplication and the Discrete Fourier Transform. We finally highlight a relation between the TCU model and the external memory model.
AB - To respond to the need for efficient training and inference of deep neural networks, a plethora of domain-specific architectures have been introduced, such as Google Tensor Processing Units and NVIDIA Tensor Cores. A common feature of these architectures is the design for efficiently computing a dense matrix product of a given small size. In order to broaden the class of algorithms that exploit these systems, we propose a computational model, named the TCU model, that captures the ability to natively multiply small matrices. We then use the TCU model for designing fast algorithms for several problems, including dense and sparse matrix multiplication and the Discrete Fourier Transform. We finally highlight a relation between the TCU model and the external memory model.
KW - computational model
KW - efficient algorithms
KW - graph problems
KW - hardware accelerators
KW - linear algebra
KW - tensor core
UR - https://www.scopus.com/pages/publications/85088657417
U2 - 10.1145/3350755.3400252
DO - 10.1145/3350755.3400252
M3 - Conference contribution
AN - SCOPUS:85088657417
T3 - Annual ACM Symposium on Parallelism in Algorithms and Architectures
SP - 519
EP - 521
BT - SPAA 2020 - Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures
PB - Association for Computing Machinery
T2 - 32nd ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2020
Y2 - 15 July 2020 through 17 July 2020
ER -