Angr
Rules:
- EagerReturnsSimplifier
- Adds additional return statements to the decompiled code to improve readabilit of the code, if the number of the “in edges” for the return node (i.e., in-degree of the return site) is less than a specified threshold
Core libraries:
- SequenceWalker
- Used to traverse graphs
For each decompiled function, angr constructs a corresponding abstract syntax tree (AST).
When angr modifies the CFG (e.g., applies EagerReturnsSim- plifier), angr calls SequenceWalker to traverse the graph and modify nodes, e.g., insert additional return statements on the AST.
Ijk_Boring is used to handle the conditional branch instruction.
FoxDec
Ghidra
Internally, Ghidra uses debug information, stored in the binary in the DWARF format, from binary to help recover the function prototype of the decompiled function.
For functions with the same name with different argu- ments (i.e., function overloading), compilers store multiple entities in DWARF sections. However, Ghidra may fail to match the correct entity for such a function. Consequently, Ghidra suspends the analysis of this function, which results in its decompiled function lacking arguments, i.e., void.
In Ghidra, constants are treated simi- larly to global variables, which means rules will be applied to infer their types (both their signedness and their sizes).
When Ghidra cannot correctly resolve indirect addresses, it uses the notion of partially re- solved address, as shown in this expression: “𝑣𝑎𝑟1.𝑥_𝑦 = 𝑣𝑎𝑟2”. This expression means that only 𝑦 bytes starting with offset 𝑥 in 𝑣𝑎𝑟1 should become equal to 𝑣𝑎𝑟2.
TLSH
Trend Locality Sensitive Hash.
Standard TLSH hash is 70 characters long.
All 3-grams from a sliding window of 5 bytes are used to compute an array of bucket counts, which are used to form the digest body.
Based on the calculation of bucket counts (as calculated above) the three quartiles are calculated (referred to as q1, q2, and q3 respectively).
The digest body is constructed based on the values of the quartiles in the array of bucket counts, using two bits per 128 buckets to construct a 32 byte digest.
The digest header is composed of a checksum, the logarithm of the byte string length and a compact representation of the histogram of bucket counts using the ratios between the quartile points for q1:q3 and q2:q3.
The TLSH distance of zero represents that the files are likely identical, and scores greater than that indicate greater degrees of dissimilarity.