Introduction

The decompiler attempts to translate from low-level representations of computer programs into high-level representations. Thus it needs to model concepts from both the low-level machine hardware domain and from the high-level software programming domain.

Understanding the classes within the source code that implement these models provides the quickest inroad into obtaining an overall understanding of the code.

We list all these fundemental classes here, loosely grouped as follows. There is one set of classes that describe the Syntax Trees, which are built up from the original p-code, and transformed during the decompiler's simplification process. The Translation classes do the actual building of the syntax trees from binary executables, and the Transformation classes do the actual work of transforming the syntax trees. Finally there is the High-level classes, which for the decompiler represents recovered information, describing familiar software development concepts, like datatypes, prototypes, symbols, variables, etc.

Syntax Trees

AddrSpace
- A place within the reverse engineering model where data can be stored. The typical address spaces are ram, modeling the main databus of a processor, and register, modeling a processors on board registers. Data is stored a byte at a time at offsets within the AddrSpace.
Address
- An AddrSpace and an offset within the space forms the Address of the byte at that offset.
Varnode
- A contiguous set of bytes, given by an Address and a size, encoding a single value in the model. In terms of SSA syntax tree, a Varnode is also a node in the tree.
SeqNum
- A sequence number that extends Address for distinguishing PcodeOps describing a single instruction.
- Overview of SeqNum
PcodeOp
- A single p-code operation. A single machine instruction is translated into (possibly several) operations in this Register Transfer Language.
- Overview of PcodeOp
BlockBasic
- A maximal sequence of p-code operations that always executes from the first PcodeOp to the last.
- Overview of BlockBasic
Funcdata
- The root object holding all information about a function, including: the p-code syntax tree, prototype, and local symbol information.
- Overview of Funcdata

Translation

Transformation

Action
Rule

High-level Representation

Overview of SeqNum

A sequence number is a form of extended address for multiple p-code operations that may be associated with the same address. There is a normal Address field. There is a time field which is a static value, determined when an operation is created, that guarantees the uniqueness of the SeqNum. There is also an order field which preserves order information about operations within a basic block. This value may change if the syntax tree is manipulated.

Address & getAddr();          // get the Address field
uintm     getTime();          // get the time field
uintm     getOrder();         // get the order field

Overview of PcodeOp

A single operation in the p-code language. It has, at most, one Varnode output, and some number of Varnode inputs. The inputs are operated on depending on the opcode of the instruction, producing the output.

OpCode       code();             // get the opcode for this op
Address &    getAddr();          // get Address of the associated processor instruction
                                 // which generated this op.
SeqNum &     getSeqNum();        // get the full unique identifier for this op
int4         numInput();         // get number of Varnode inputs to this op
Varnode *    getOut();           // get Varnode output
Varnode *    getIn(int4 i);      // get (one of the) Varnode inputs
BlockBasic * getParent();        // get basic block containing this op
bool         isDead();           // op may no longer be in syntax tree
bool         isCall();           // various categories of op
bool         isBranch();
bool         isBoolOutput();

Overview of BlockBasic

A sequence of PcodeOps with a single path of execution.

int4         sizeOut();         // get number of paths flowing out of this block
int4         sizeIn();          // get number of paths flowing into this block
BlockBasic *getIn(int4 i)       // get (one of the) blocks flowing into this
BlockBasic *getOut(int4 i)      // get (one of the) blocks flowing out of this
SeqNum &    getStart();         // get SeqNum of first operation in block
SeqNum &    getStop();          // get SeqNum of last operation in block
BlockBasic *getImmedDom();      // get immediate dominator block
iterator    beginOp();          // get iterator to first PcodeOp in block
iterator    endOp();

Overview of Funcdata

This is a container for the sytax tree associated with a single function and all other function specific data. It has an associated start address, function prototype, and local scope.

string &       getName();           // get name of function
Address &      getAddress();        // get Address of function's entry point
int4           numCalls();          // number of subfunctions called by this function
FuncCallSpecs *getCallSpecs(int4 i); // get specs for one of the subfunctions
BlockGraph &   getBasicBlocks();    // get the collection of basic blocks
iterator       beginLoc(Address &);                     // Search for Varnodes in tree
iterator       beginLoc(int4,Address &);                 // based on the Varnode's address
iterator       beginLoc(int4,Address &,Address &,uintm);
iterator       beginDef(uint4,Address &);               // Search for Varnode based on the
                                                        // address of its defining operation

LoadImage

Action

Rule

Translate

Decodes machine instructions and can produce p-code.

int4 oneInstruction(PcodeEmit &,Address &) const; // produce pcode for one instruction

void printAssembly(ostream &,int4,Address &) const; // print the assembly for one instruction

Datatype

Many objects have an associated Datatype, including Varnodes, Symbols, and FuncProtos. A Datatype is built to resemble the type systems of common high-level languages like C or Java.

type_metatype getMetatype();     // categorize type as VOID, UNKNOWN,
                                 // INT, UINT, BOOL, CODE, FLOAT,
                                 // PTR, ARRAY, STRUCT
string &      getName();         // get name of the type
int4          getSize();         // get number of bytes encoding this type

There are base types (in varying sizes) as returned by getMetatype.

enum type_metatype {
  TYPE_VOID,       // void type
  TYPE_UNKNOWN,    // unknown type
  TYPE_INT,        // signed integer
  TYPE_UINT,       // unsigned integer
  TYPE_BOOL,       // boolean
  TYPE_CODE,       // function data
  TYPE_FLOAT,      // floating point
};

Then these can be used to build compound types, with pointer, array, and structure qualifiers.

class TypePointer : public Datatype  {  // pointer to (some other type)
  Datatype *getBase();                  // get Datatype being pointed to
};
class TypeArray : public Datatype {     // array of (some other type)
  Datatype *getBase();                  // get Datatype of array element
};
class TypeStruct : public Datatype {    // structure with fields of (some other types)
  TypeField *getField(int4,int4,int4 *);   // get Datatype of a field
};

TypeFactory

This is a container for Datatypes.

Datatype *findByName(string &);                 // find a Datatype by name
Datatype *getTypeVoid();                        // retrieve common types
Datatype *getTypeChar();
Datatype *getBase(int4 size,type_metatype);
Datatype *getTypePointer(int4,Datatype *,uint4);   // get a pointer to another type
Datatype *getTypeArray(int4,Datatype *);         // get an array of another type

HighVariable

A single high-level variable can move in and out of various memory locations and registers during the course of its lifetime. A HighVariable encapsulates this concept. It is a collection of (low-level) Varnodes, all of which are used to store data for one high-level variable.

int4       numInstances();      // get number of different Varnodes associated
                                // with this variable.
Varnode *  getInstance(int4);    // get (one of the) Varnodes associated with
                                // this variable.
Datatype * getType();           // get Datatype of this variable
Symbol *   getSymbol();         // get Symbol associated with this variable

FuncProto

FuncCallSpecs

Symbol

A particular symbol used for describing memory in the model. This behaves like a normal (high-level language) symbol. It lives in a scope, has a name, and has a Datatype.

string &      getName();        // get the name of the symbol
Datatype *    getType();        // get the Datatype of the symbol
Scope *       getScope();       // get the scope containing the symbol
SymbolEntry * getFirstWholeMap(); // get the (first) SymbolEntry associated
                                // with this symbol

SymbolEntry

This associates a memory location with a particular symbol, i.e. it maps the symbol to memory. Its, in theory, possible to have more than one SymbolEntry associated with a Symbol.

Address &   getAddr();         // get Address of memory location
int4        getSize();         // get size of memory location
Symbol *    getSymbol();       // get Symbol associated with location
RangeList & getUseLimit();     // get range of code addresses for which
                               // this mapping applies

Scope

This is a container for symbols.

SymbolEntry *findAddr(Address &,Address &);          // find a Symbol by address
SymbolEntry *findContainer(Address &,int4,Address &); // find containing symbol
Funcdata *   findFunction(Address &);                // find a function by entry address
Symbol *     findByName(string &);                   // find a Symbol by name
SymbolEntry *queryByAddr(Address &,Address &);       // search for symbols across multiple scopes
SymbolEntry *queryContainer(Address &,int4,Address &);
Funcdata *   queryFunction(Address &);
Scope *      discoverScope(Address &,int4,Address &); // discover scope of an address
string &     getName();                              // get name of scope
Scope *      getParent();                            // get parent scope

Database

This is the container for Scopes.

Scope *getGlobalScope(); // get the root/global scope

Scope *resolveScope(string &,Scope *); // resolve a scope by name

Architecture

This is the repository for all information about a particular processor and executable. It holds the symbol table, the processor translator, the load image, the type database, and the transform engine.

class Architecture {
  Database *     symboltab;   // the symbol table
  Translate *    translate;   // the processor translator
  LoadImage *    loader;      // the executable loadimage
  ActionDatabase allacts;     // transforms which can be performed
  TypeFactory *  types;       // the Datatype database
};