TY - GEN
T1 - Toward optimizing latency under throughput constraints for application workflows on clusters
AU - Vydyanathan, Nagavijayalakshmi
AU - Catalyurek, Umit V.
AU - Kure, Tahsin M.
AU - Sadayappan, Ponnuswamy
AU - Saltz, Joel H.
PY - 2007
Y1 - 2007
N2 - In many application domains, it is desirable to meet some user-defined performance requirement while minimizing resource usage and optimizing additional performance parameters. For example, application workflows with real-time constraints may have strict throughput requirements and desire a low latency or response-time. The structure of these workflows can be represented as directed acyclic graphs of coarse-grained application tasks with data dependences. In this paper, we develop a novel mapping and scheduling algorithm that minimizes the latency of workflows that act on a stream of input data, while satisfying throughput requirements. The algorithm employs pipelined parallelism and intelligent clustering and replication of tasks to meet throughput requirements. Latency is minimized by exploiting task parallelism and reducing communication overheads. Evaluation using synthetic benchmarks and application task graphs shows that our algorithm 1) consistently meets throughput requirements even when other existing schemes fail, 2) produces lower-latency schedules, and 3) results in lesser resource usage.
AB - In many application domains, it is desirable to meet some user-defined performance requirement while minimizing resource usage and optimizing additional performance parameters. For example, application workflows with real-time constraints may have strict throughput requirements and desire a low latency or response-time. The structure of these workflows can be represented as directed acyclic graphs of coarse-grained application tasks with data dependences. In this paper, we develop a novel mapping and scheduling algorithm that minimizes the latency of workflows that act on a stream of input data, while satisfying throughput requirements. The algorithm employs pipelined parallelism and intelligent clustering and replication of tasks to meet throughput requirements. Latency is minimized by exploiting task parallelism and reducing communication overheads. Evaluation using synthetic benchmarks and application task graphs shows that our algorithm 1) consistently meets throughput requirements even when other existing schemes fail, 2) produces lower-latency schedules, and 3) results in lesser resource usage.
UR - https://www.scopus.com/pages/publications/38049180870
U2 - 10.1007/978-3-540-74466-5_20
DO - 10.1007/978-3-540-74466-5_20
M3 - Conference contribution
AN - SCOPUS:38049180870
SN - 9783540744658
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 173
EP - 183
BT - Euro-Par 2007 Parallel Processing - 13th International Euro-Par Conference, Proceedings
PB - Springer Verlag
T2 - 13th International Euro-Par Conference on Parallel Processing, Euro-Par 2007
Y2 - 28 August 2007 through 31 August 2007
ER -