TY - GEN
T1 - Extracting Instruction semantics via symbolic execution of code generators
AU - Hasabnis, Niranjan
AU - Sekar, R.
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/11/1
Y1 - 2016/11/1
N2 - Binary analysis and instrumentation form the basis of many tools and frameworks for software debugging, security hardening, and monitoring. Accurate modeling of instruction semantics is paramount in this regard, as errors can lead to program crashes, or worse, bypassing of security checks. Semantic modeling is a daunting task for modern processors such as x86 and ARM that support over a thousand instructions, many of them with complex semantics. This paper describes a new approach to automate this semantic modeling task. Our approach leverages instruction semantics knowledge that is already encoded into today's production compilers such as GCC and LLVM. Such an approach can greatly reduce manual effort, and more importantly, avoid errors introduced by manual modeling. Furthermore, it is applicable to any of the numerous architectures already supported by the compiler. In this paper, we develop a new symbolic execution technique to extract instruction semantics from a compiler's source code. Unlike previous applications of symbolic execution that were focused on identifying a single program path that violates a property, our approach addresses the all paths problem, extracting the entire input/ output behavior of the code generator. We have applied it successfully to the 120K lines of C-code used in GCC's code generator to extract x86 instruction semantics. To demonstrate architecture-neutrality, we have also applied it to AVR, a processor used in the popular Arduino platform.
AB - Binary analysis and instrumentation form the basis of many tools and frameworks for software debugging, security hardening, and monitoring. Accurate modeling of instruction semantics is paramount in this regard, as errors can lead to program crashes, or worse, bypassing of security checks. Semantic modeling is a daunting task for modern processors such as x86 and ARM that support over a thousand instructions, many of them with complex semantics. This paper describes a new approach to automate this semantic modeling task. Our approach leverages instruction semantics knowledge that is already encoded into today's production compilers such as GCC and LLVM. Such an approach can greatly reduce manual effort, and more importantly, avoid errors introduced by manual modeling. Furthermore, it is applicable to any of the numerous architectures already supported by the compiler. In this paper, we develop a new symbolic execution technique to extract instruction semantics from a compiler's source code. Unlike previous applications of symbolic execution that were focused on identifying a single program path that violates a property, our approach addresses the all paths problem, extracting the entire input/ output behavior of the code generator. We have applied it successfully to the 120K lines of C-code used in GCC's code generator to extract x86 instruction semantics. To demonstrate architecture-neutrality, we have also applied it to AVR, a processor used in the popular Arduino platform.
KW - Code Generators
KW - Instruction-Set Semantics Extraction
KW - Symbolic Execution
UR - https://www.scopus.com/pages/publications/84997207622
U2 - 10.1145/2950290.2950335
DO - 10.1145/2950290.2950335
M3 - Conference contribution
AN - SCOPUS:84997207622
T3 - Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering
SP - 301
EP - 313
BT - FSE 2016 - Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering
A2 - Su, Zhendong
A2 - Zimmermann, Thomas
A2 - Cleland-Huang, Jane
PB - Association for Computing Machinery
T2 - 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016
Y2 - 13 November 2016 through 18 November 2016
ER -