decompiler  1.0.0
Core Classes

Introduction

The decompiler attempts to translate from low-level representations of computer programs into high-level representations. Thus it needs to model concepts from both the low-level machine hardware domain and from the high-level software programming domain.

Understanding the classes within the source code that implement these models provides the quickest inroad into obtaining an overall understanding of the code.

We list all these fundemental classes here, loosely grouped as follows. There is one set of classes that describe the Syntax Trees, which are built up from the original p-code, and transformed during the decompiler's simplification process. The Translation classes do the actual building of the syntax trees from binary executables, and the Transformation classes do the actual work of transforming the syntax trees. Finally there is the High-level classes, which for the decompiler represents recovered information, describing familiar software development concepts, like datatypes, prototypes, symbols, variables, etc.

Syntax Trees

Translation

Transformation

High-level Representation

Overview of SeqNum

A sequence number is a form of extended address for multiple p-code operations that may be associated with the same address. There is a normal Address field. There is a time field which is a static value, determined when an operation is created, that guarantees the uniqueness of the SeqNum. There is also an order field which preserves order information about operations within a basic block. This value may change if the syntax tree is manipulated.

Address & getAddr(); // get the Address field
uintm getTime(); // get the time field
uintm getOrder(); // get the order field

Overview of PcodeOp

A single operation in the p-code language. It has, at most, one Varnode output, and some number of Varnode inputs. The inputs are operated on depending on the opcode of the instruction, producing the output.

OpCode code(); // get the opcode for this op
Address & getAddr(); // get Address of the associated processor instruction
// which generated this op.
SeqNum & getSeqNum(); // get the full unique identifier for this op
int4 numInput(); // get number of Varnode inputs to this op
Varnode * getOut(); // get Varnode output
Varnode * getIn(int4 i); // get (one of the) Varnode inputs
BlockBasic * getParent(); // get basic block containing this op
bool isDead(); // op may no longer be in syntax tree
bool isCall(); // various categories of op
bool isBranch();
bool isBoolOutput();

Overview of BlockBasic

A sequence of PcodeOps with a single path of execution.

int4 sizeOut(); // get number of paths flowing out of this block
int4 sizeIn(); // get number of paths flowing into this block
BlockBasic *getIn(int4 i) // get (one of the) blocks flowing into this
BlockBasic *getOut(int4 i) // get (one of the) blocks flowing out of this
SeqNum & getStart(); // get SeqNum of first operation in block
SeqNum & getStop(); // get SeqNum of last operation in block
BlockBasic *getImmedDom(); // get immediate dominator block
iterator beginOp(); // get iterator to first PcodeOp in block
iterator endOp();

Overview of Funcdata

This is a container for the sytax tree associated with a single function and all other function specific data. It has an associated start address, function prototype, and local scope.

string & getName(); // get name of function
Address & getAddress(); // get Address of function's entry point
int4 numCalls(); // number of subfunctions called by this function
FuncCallSpecs *getCallSpecs(int4 i); // get specs for one of the subfunctions
BlockGraph & getBasicBlocks(); // get the collection of basic blocks
iterator beginLoc(Address &); // Search for Varnodes in tree
iterator beginLoc(int4,Address &); // based on the Varnode's address
iterator beginLoc(int4,Address &,Address &,uintm);
iterator beginDef(uint4,Address &); // Search for Varnode based on the
// address of its defining operation

LoadImage

Action

Rule

Translate

Decodes machine instructions and can produce p-code.

int4 oneInstruction(PcodeEmit &,Address &) const; // produce pcode for one instruction
void printAssembly(ostream &,int4,Address &) const; // print the assembly for one instruction

Datatype

Many objects have an associated Datatype, including Varnodes, Symbols, and FuncProtos. A Datatype is built to resemble the type systems of common high-level languages like C or Java.

type_metatype getMetatype(); // categorize type as VOID, UNKNOWN,
// INT, UINT, BOOL, CODE, FLOAT,
// PTR, ARRAY, STRUCT
string & getName(); // get name of the type
int4 getSize(); // get number of bytes encoding this type

There are base types (in varying sizes) as returned by getMetatype.

TYPE_VOID, // void type
TYPE_UNKNOWN, // unknown type
TYPE_INT, // signed integer
TYPE_UINT, // unsigned integer
TYPE_BOOL, // boolean
TYPE_CODE, // function data
TYPE_FLOAT, // floating point
};

Then these can be used to build compound types, with pointer, array, and structure qualifiers.

class TypePointer : public Datatype { // pointer to (some other type)
Datatype *getBase(); // get Datatype being pointed to
};
class TypeArray : public Datatype { // array of (some other type)
Datatype *getBase(); // get Datatype of array element
};
class TypeStruct : public Datatype { // structure with fields of (some other types)
TypeField *getField(int4,int4,int4 *); // get Datatype of a field
};

TypeFactory

This is a container for Datatypes.

Datatype *findByName(string &); // find a Datatype by name
Datatype *getTypeVoid(); // retrieve common types
Datatype *getTypeChar();
Datatype *getBase(int4 size,type_metatype);
Datatype *getTypePointer(int4,Datatype *,uint4); // get a pointer to another type
Datatype *getTypeArray(int4,Datatype *); // get an array of another type

HighVariable

A single high-level variable can move in and out of various memory locations and registers during the course of its lifetime. A HighVariable encapsulates this concept. It is a collection of (low-level) Varnodes, all of which are used to store data for one high-level variable.

int4 numInstances(); // get number of different Varnodes associated
// with this variable.
Varnode * getInstance(int4); // get (one of the) Varnodes associated with
// this variable.
Datatype * getType(); // get Datatype of this variable
Symbol * getSymbol(); // get Symbol associated with this variable

FuncProto

FuncCallSpecs

Symbol

A particular symbol used for describing memory in the model. This behaves like a normal (high-level language) symbol. It lives in a scope, has a name, and has a Datatype.

string & getName(); // get the name of the symbol
Datatype * getType(); // get the Datatype of the symbol
Scope * getScope(); // get the scope containing the symbol
SymbolEntry * getFirstWholeMap(); // get the (first) SymbolEntry associated
// with this symbol

SymbolEntry

This associates a memory location with a particular symbol, i.e. it maps the symbol to memory. Its, in theory, possible to have more than one SymbolEntry associated with a Symbol.

Address & getAddr(); // get Address of memory location
int4 getSize(); // get size of memory location
Symbol * getSymbol(); // get Symbol associated with location
RangeList & getUseLimit(); // get range of code addresses for which
// this mapping applies

Scope

This is a container for symbols.

SymbolEntry *findAddr(Address &,Address &); // find a Symbol by address
SymbolEntry *findContainer(Address &,int4,Address &); // find containing symbol
Funcdata * findFunction(Address &); // find a function by entry address
Symbol * findByName(string &); // find a Symbol by name
SymbolEntry *queryByAddr(Address &,Address &); // search for symbols across multiple scopes
SymbolEntry *queryContainer(Address &,int4,Address &);
Funcdata * queryFunction(Address &);
Scope * discoverScope(Address &,int4,Address &); // discover scope of an address
string & getName(); // get name of scope
Scope * getParent(); // get parent scope

Database

This is the container for Scopes.

Scope *getGlobalScope(); // get the root/global scope
Scope *resolveScope(string &,Scope *); // resolve a scope by name

Architecture

This is the repository for all information about a particular processor and executable. It holds the symbol table, the processor translator, the load image, the type database, and the transform engine.

class Architecture {
Database * symboltab; // the symbol table
Translate * translate; // the processor translator
LoadImage * loader; // the executable loadimage
ActionDatabase allacts; // transforms which can be performed
TypeFactory * types; // the Datatype database
};