TY - GEN
T1 - A Scheduling Approach to Incremental Maintenance of Datalog Programs
AU - Singh, Shikha
AU - Madaminov, Sergey
AU - Bender, Michael A.
AU - Ferdman, Michael
AU - Johnson, Ryan
AU - Moseley, Benjamin
AU - Ngo, Hung
AU - Nguyen, Dung
AU - Olesen, Soeren
AU - Stirewalt, Kurt
AU - Washburn, Geoffrey
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/5
Y1 - 2020/5
N2 - In this paper, we study the problem of incremental maintenance of Datalog programs and model it as a scheduling problem on DAGs. We design provably good time-and memory-efficient scheduling algorithms for (re)executing a Datalog program where some (but not necessarily all) of the inputs have changed. We prove that our schedulers, called LevelBased and LevelBased with lookahead, have asymptotically improved running time and space efficiency when compared with benchmark algorithms used in production at LogicBlox.The main result of the paper is a hybrid scheduler, which combines LevelBased with the production LogicBlox scheduler (or any other heuristic scheduler). The hybrid scheduler achieves strong worst-case guarantees and robustness without losing out on the best-case behavior of the production LogicBlox scheduler. Our experiments show that the hybrid scheduler results in similar or improved total execution times compared to LogicBlox scheduler, while consistently reducing the scheduling overhead-by as much as 50% on some datasets. This hybrid scheme requires little to no overhead but provides predictability and reliability, which are crucial in a commercial application such as LogicBlox.
AB - In this paper, we study the problem of incremental maintenance of Datalog programs and model it as a scheduling problem on DAGs. We design provably good time-and memory-efficient scheduling algorithms for (re)executing a Datalog program where some (but not necessarily all) of the inputs have changed. We prove that our schedulers, called LevelBased and LevelBased with lookahead, have asymptotically improved running time and space efficiency when compared with benchmark algorithms used in production at LogicBlox.The main result of the paper is a hybrid scheduler, which combines LevelBased with the production LogicBlox scheduler (or any other heuristic scheduler). The hybrid scheduler achieves strong worst-case guarantees and robustness without losing out on the best-case behavior of the production LogicBlox scheduler. Our experiments show that the hybrid scheduler results in similar or improved total execution times compared to LogicBlox scheduler, while consistently reducing the scheduling overhead-by as much as 50% on some datasets. This hybrid scheme requires little to no overhead but provides predictability and reliability, which are crucial in a commercial application such as LogicBlox.
KW - DAG scheduling
KW - databases
KW - Datalog programs
KW - incremental computing
KW - incremental maintenance
KW - LogicBlox
KW - parallel task scheduling
UR - https://www.scopus.com/pages/publications/85088892301
U2 - 10.1109/IPDPS47924.2020.00093
DO - 10.1109/IPDPS47924.2020.00093
M3 - Conference contribution
AN - SCOPUS:85088892301
T3 - Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium, IPDPS 2020
SP - 864
EP - 873
BT - Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium, IPDPS 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 34th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2020
Y2 - 18 May 2020 through 22 May 2020
ER -