Fast location of similar code fragments using semantic ‘juice’
Arun Lakhotia, University of Louisiana at Lafayette
Mila Dalla Preda, University of Verona
Roberto Giacobazzi, University of Verona
Abstraction of semantics of blocks of a binary is termed as ‘juice.’ Whereas the denotational semantics summarizes the computation performed by a block, its juice presents a template of the relationships established by the block. BinJuice is a tool for extracting the ‘juice’ of a binary. It symbolically interprets individual blocks of a binary to extract their semantics: the effect of the block on the program state. The semantics is generalized to juice by replacing register names and literal constants by typed, logical variables. The juice also maintains algebraic constraints between the numeric variables. Thus, this juice forms a semantic template that is expected to be identical regardless of code variations due to register renaming, memory address allocation, and constant replacement. The terms in juice can be canonically ordered using a linear order presented. Thus semantically equivalent (rather, similar) code fragments can be identified by simple structural comparison of their juice, or by comparing their hashes. While BinJuice cannot find all equivalent constructs, for that would solve the Halting Problem, it does significantly improve the state-of-the-art in both the computational complexity as well as the set of equivalences it can establish. Preliminary results show that juice is effective in pairing code variants created by post-compile obfuscating transformations.
Full Citation: Lakhotia, Arun, Mila Dalla Preda, and Roberto Giacobazzi. “Fast location of similar code fragments using semantic’juice’.” In Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop, pp. 1-6. 2013.
Link to Research Paper: https://dl.acm.org/doi/abs/10.1145/2430553.2430558