TY - GEN
T1 - An out-of-core implementation of the COLUMBUS massively-parallel multireference configuration interaction program
AU - Dachsel, Holger
AU - Nieplocha, Jarek
AU - Harrison, Robert
N1 - Publisher Copyright:
© 1998 IEEE.
PY - 1998
Y1 - 1998
N2 - In this paper, we describe a novel parallelization approach we developed to solve the largest multireference configuration interaction (MRCI) problem ever attempted. From the mathematical perspective, the program solves the eigenvalue problem for a very large, sparse, symmetric Hamilton matrix. Using an out-of-core approach, shared memory programming model, improved data compression algorithms, and dynamic load balancing we were able to solve a problem six times larger than previously reported. The potential curve for the chromium dimer was calculated with a Hamilton matrix of dimension 1.3 billion (1,295,937,374). This task involved moving 1.5 terabytes of data between main memory and secondary storage per MRCI iteration. Furthermore, by employing Active Messages and user-level striping to combine multiple files on local disks on the IBM SP into a single logically-shared file, the execution time of the program was reduced by a factor of three, as compared to our initial implementation on top of the IBM PIOFS parallel filesystem.
AB - In this paper, we describe a novel parallelization approach we developed to solve the largest multireference configuration interaction (MRCI) problem ever attempted. From the mathematical perspective, the program solves the eigenvalue problem for a very large, sparse, symmetric Hamilton matrix. Using an out-of-core approach, shared memory programming model, improved data compression algorithms, and dynamic load balancing we were able to solve a problem six times larger than previously reported. The potential curve for the chromium dimer was calculated with a Hamilton matrix of dimension 1.3 billion (1,295,937,374). This task involved moving 1.5 terabytes of data between main memory and secondary storage per MRCI iteration. Furthermore, by employing Active Messages and user-level striping to combine multiple files on local disks on the IBM SP into a single logically-shared file, the execution time of the program was reduced by a factor of three, as compared to our initial implementation on top of the IBM PIOFS parallel filesystem.
UR - https://www.scopus.com/pages/publications/85106908916
U2 - 10.1109/SC.1998.10027
DO - 10.1109/SC.1998.10027
M3 - Conference contribution
AN - SCOPUS:85106908916
T3 - Proceedings of the International Conference on Supercomputing
BT - SC 1998 - Proceedings of the ACM/IEEE Conference on Supercomputing
PB - Association for Computing Machinery
T2 - 1998 ACM/IEEE Conference on Supercomputing, SC 1998
Y2 - 7 November 1998 through 13 November 1998
ER -