TY - GEN
T1 - Efficient execution of microscopy image analysis on CPU, GPU, and MIC equipped cluster systems
AU - Andrade, G.
AU - Ferreira, R.
AU - Teodoro, George
AU - Rocha, Leonardo
AU - Saltz, Joel H.
AU - Kurc, Tahsin
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/12/1
Y1 - 2014/12/1
N2 - High performance computing is experiencing a major paradigm shift with the introduction of accelerators, such as graphics processing units (GPUs) and Intel Xeon Phi (MIC). These processors have made available a tremendous computing power at low cost, and are transforming machines into hybrid systems equipped with CPUs and accelerators. Although these systems can deliver a very high peak performance, making full use of its resources in real-world applications is a complex problem. Most current applications deployed to these machines are still being executed in a single processor, leaving other devices underutilized. In this paper we explore a scenario in which applications are composed of hierarchical dataow tasks which are allocated to nodes of a distributed memory machine in coarse-grain, but each of them may be composed of several ner-grain tasks which can be allocated to dierent devices within the node. We propose and implement novel performance aware scheduling techniques that can be used to allocate tasks to devices. We evaluate our techniques using a pathology image analysis application used to investigate brain cancer morphology, and our experimental evaluation shows that the proposed scheduling strategies signicantly outperforms other ecient scheduling techniques, such as Heterogeneous Earliest Finish Time - HEFT, in cooperative executions using CPUs, GPUs, and Masc. also experimentally show that our strategies are less sensitive to inaccuracy in the scheduling input data and that the performance gains are maintained as the application scales.
AB - High performance computing is experiencing a major paradigm shift with the introduction of accelerators, such as graphics processing units (GPUs) and Intel Xeon Phi (MIC). These processors have made available a tremendous computing power at low cost, and are transforming machines into hybrid systems equipped with CPUs and accelerators. Although these systems can deliver a very high peak performance, making full use of its resources in real-world applications is a complex problem. Most current applications deployed to these machines are still being executed in a single processor, leaving other devices underutilized. In this paper we explore a scenario in which applications are composed of hierarchical dataow tasks which are allocated to nodes of a distributed memory machine in coarse-grain, but each of them may be composed of several ner-grain tasks which can be allocated to dierent devices within the node. We propose and implement novel performance aware scheduling techniques that can be used to allocate tasks to devices. We evaluate our techniques using a pathology image analysis application used to investigate brain cancer morphology, and our experimental evaluation shows that the proposed scheduling strategies signicantly outperforms other ecient scheduling techniques, such as Heterogeneous Earliest Finish Time - HEFT, in cooperative executions using CPUs, GPUs, and Masc. also experimentally show that our strategies are less sensitive to inaccuracy in the scheduling input data and that the performance gains are maintained as the application scales.
UR - https://www.scopus.com/pages/publications/84919454562
U2 - 10.1109/SBAC-PAD.2014.15
DO - 10.1109/SBAC-PAD.2014.15
M3 - Conference contribution
AN - SCOPUS:84919454562
T3 - Proceedings - Symposium on Computer Architecture and High Performance Computing
SP - 89
EP - 96
BT - Proceedings - IEEE 26th International Symposium
PB - IEEE Computer Society
T2 - 26th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2014
Y2 - 22 October 2014 through 24 October 2014
ER -