The Olympus native code generation framework consists of a compiler library, with plug-in language parsers and code generators, and a C-based abstract machine that supports dynamic code loading on micro-core devices. Olympus can support a number of different micro-cores, including RISC-V, Xilinx MicroBlaze and Adapteva Epiphany, as well as traditional CPUs such as the x86, ARM, SPARC, PowerPC and MIPS.
Table of Contents
Compiler framework
The Olympus compiler framework plug-in architecture allows key components to be easily selected at runtime. Currently, Vipera includes a parser for vPython and code generators for the Olympus abstract machine (olygen
) and the Graphviz DOT graph desciption language (dotgen
). It is also possible to implement additional compiler phases that perform additional actions on the Abstract Syntax Tree (AST), such as futher code optimisation.
Code generators
The code generator plug-ins consist of a set of functions that generate code from specific AST node types. These functions are defined for each code generation target and Olympus will generate and traverse the AST, applying each function as required. Each generator is assigned to a specific function pointer within an ASTtarget
structure, which is then used by Olympus during the code generation phase. The registration of the generators is performed by the target initialisation function e.g. ast_newOlympusTarget(void)
. The following code snippet from the Olympus abstract machine target initialisation function shows the registration of each of the generator functions and the setFilenames
function used by Olympus to create the relevant files for the specific target. For example, for Olympus abstract machine code, the <program name>_oly_main.c
and <program name>_oly_functions.c
files will be created; the former file contains the main program code and the latter file contains the function code.
target->type = OLYMPUS_CODE;
target->setFilenames = olygenSetFilenames;
target->generators.Program = olygenProgram;
target->generators.Declaration = olygenDecl;
target->generators.Id = olygenId;
target->generators.Literal = olygenLiteral;
target->generators.Assignment = olygenAssign;
target->generators.Lambda = olygenLambda;
target->generators.Beta = olygenBeta;
target->generators.Case = olygenCase;
target->generators.Condition = olygenCondition;
target->generators.While = olygenWhile;
target->generators.For = olygenFor;
target->generators.Return = olygenReturn;
target->generators.Array = olygenArray;
target->generators.ArrayElementIndex = olygenArrayElementIndex;
target->generators.ArrayElementUpdate = olygenArrayElementUpdate;
target->generators.Increment = olygenIncrement;
target->generators.Decrement = olygenDecrement;
target->generators.Arithmetic = olygenArithmetic;
target->generators.Expression = olygenExpression;
Each generator has the following function signature:
typedef RefString (*ASTgenFn)(ASTtarget*, ASTexpression*);
Where ASTtarget
is the current code generation target and ASTexpression
is the current node being processed. For example, the following is the Olympus abstract machine target generator for an assignment:
static RefString olygenAssign(ASTtarget *target, ASTexpression *node) {
RefString rhsString = generateCode(target, RHS(node));
sprintf(target->line,"ST%s(ADDRF(%d,%d),%s);\n",
(IS_ARRAY(LHS(node)) ? "A" : olygenGetTypeName(TYPE(LHS(node)), AST_INITIAL)),
LEVEL(LHS(node)), OFFSET(LHS(node)),
getRefStringChars(rhsString));
freeRefString(rhsString);
return makeRefString(target->line);
}
A key point to note here is that all generators return a RefString
which is a reference-counted string to allow Olympus to manage memory more efficiently. The framework provides a number of functions to create, concatenate and free RefStrings
. The Olympus ASTtarget
structure includes a large buffer target->line
to allow generators to use standard C string handling library functions. The above generator example also shows a number of convenience macros to access elements of the AST e.g. RHS()
, IS_ARRAY()
and OFFSET()
.
NOTE: These convenience macros should be used in preference to accessing the
ASTexpression
structure fields directly to allow the underlying implementation to be changed without requiring updates to generator functions etc.
Whilst Olympus will traverse the AST via the generateCode()
function, the generators need to be aware of the AST structure as they may need to traverse the sub-nodes differently in order to generate the correct code. For example, the olygen
and dotgen
code generators process the program
node differently as Olympus abstract machine code requires that all the functions are declared at the start of the main code file and that the function declarations themselves are placed in the separate functions code file. However, rather than calling generator functions directly, the code generator should pass control back to Olympus by calling the generateCode()
function to process sub-nodes, as shown in the example above. Generally, ASTexpression
nodes have a left-hand and right-hand side sub nodes (accesible via the LHS()
and RHS()
macros), with specialised nodes such as AST_LAMBDA, AST_WHILE etc. having specific value
structures as shown in the definition of ASTexpression
below:
typedef struct expression {
ASToperation op;
ASTtype type;
char *name;
ASTid id;
ASTlevel level;
ASToffset offset;
struct expression *parent;
ASTcount referenced;
struct expression *next;
union {
AST_TYPES
struct array {struct expression **initialisers, *repeat; ASTcount initc;} Array;
struct indexes {struct expression **indexes; ASTcount indexc;} Indexes;
struct expr {struct expression *lhs, *rhs;} Expr;
struct CaseStatement {struct expression **conditions, *dflt; ASTcount condc;} Case;
struct lambda {struct expression **args, *body, *variants; ASTcount argc, size;} Lambda;
struct beta {struct expression **args, *func; ASTcount argc; ASTboolean recursive;} Beta;
struct ForStatement {struct expression *iterator, *list, *block; ASTcount size;} For;
struct program {struct expression *functions; ASTcount fcount, size;} Program;
} value;
} ASTexpression;
The type
and op
fields of ASTexpression
store the node’s type (AST_INTEGER
, AST_REAL
, AST_STRING
, AST_COMPLEX
etc.) and the operation (AST_ADD
, AST_ASSIGNMENT
, AST_WHILE
, AST_LAMBDA
etc.), respectively. For example, the expression 5 + 6
has a type1 of AST_INTEGER
and an operation of AST_ADD
with a LHS(node)
value of 5
and a RHS(node)
value of 6
.
The Olympus compiler will build the AST by appending new nodes to the current node’s next
field. Generally, statements will have next
nodes but expressions
will not i.e. AST_ASSIGNMENT nodes will likely have a next
node but AST_ADD nodes will not.
Olympus abstract machine
The Olympus abstract machine consists of a series of C mnemonics, implemented as macros or functions, and a runtime library that includes memory management. The abstract machine, as the name suggests, models a machine designed to support dynamic object-oriented programming languages with features such as first-class and anonymous functions (lambdas). Olympus has been specifically designed to support scientific kernels on micro-core devices with extremely limited on-chip memory (c. 32KB). Therefore, dynamic type checking has been eliminated as far as possible by the type inferencing performed by the compiler framework.
Memory addressing model
As can be seen in the olygenAssign()
generator function listing above, the Olympus abstract machine uses a lexical level and offset addressing model. This consists of an environment of frames referenced via a display. Every program object2 is declared at a specific offset within a lexical level. For example, all global objects, are declared within lexical level 0 and a function’s local variables would be declared within a scope level greater than or equal to 13. When the code is generated, these levels are translated into scope levels where the level increases outwards from the current scope i.e. within a function, the local scope level is 0 and, for vPython programs, the global scope level is 1. Olympus abstract machine code programs access objects via the display, which is updated as scope levels are changed e.g. entry / exit of functions and recursive function calls. This translation, and the use of the display, ensure that local object lookups are faster than global object lookups. The example below shows the environment for a program that is able to leverage nested functions. Here, the inner function has a single parameter and local vector, the outer function has two parameters, a local complex number and a string, and two real numbers are declared in the global scope.
The example also highlights that fixed-sized compound objects, such as complex numbers, can also reside in the stack frame (frame n-1
) as well as in the heap.
NOTE: The Olympus compiler framework will generate the correct levels and offsets for objects within the AST. The
enterScope()
andleaveScope()
functions within the parser allow scoping levels to be tailored to the specific needs of a particular programming language. For example, the Bison grammar file rules for afor
loop could useenterScope()
andleaveScope()
to ensure that any new declarations are only in-scope within thefor
loop body, as found in programming languages such as Java. The framework also populates theASTexpression id
field with a unique ID number, which can be used over and above the level and offset.
The Olympus abstract machine provides two object addressing mnemonics: ADDRF(level,offset)
and ADDRL(offset)
, where the former returns a foreign object at level
and offset
, and the latter returns a local object at offset
. In general, the olygen
code generator will generate ADDRL()
mnemonics for local variables (level = 0) and will generate ADDRF()
mnemonics when the level is greater than 0. All object access mnemonics use the address returned by these two mnemonics, as shown in the generated code example below, which declares a new object z
at offset 5
and sets its value to that of object at offset 1
plus 50
.
DECLI("z",ADDRL(5),LDI(ADDRL(1))+50);
The object access mnemonics are typed, with the last character of the mmnemonic name denoting the type e.g. DECLI
declares an integer object and DECLR
declares a real (float
) object. The olygenGetTypeName(TYPE(LHS(node)), AST_INITIAL))
function call used in the olygenAssign
generator above returns the correct character representing the type. For example, ‘I’ for integer, ‘R’ for real, ‘S’ for string’ and ‘B’ for boolean types. The same function will also return the C type name (Integer
, Real
, String
, Boolean
etc.) by using AST_CAPITALISE
as the second argument. All the AST types and mnemonics are defined in the ast.h
file in the olympus/compiler/ast
subdirectory and the Olympus abstract machine mnemonic are defined in the mnemonics.h
file in the olympus/runtime
directory.
Memory map
On the device, the Olympus memory map consists of the local on-chip RAM (32KB on the Adapteva Epiphany-III, 64KB on the Xilinx MicroBlaze and up to 128KB on the RISC-V), the shared memory on the host (32MB) and the external memory of the on-host CPython heap (up to 1GB on the Adapteva Parallella). By default, all compound object types (lists, complex numbers etc.) reside in the in-core local heap but there are options to place objects in the memory of other on-chip cores (not shown on the diagram for clarity), the on-host shared and external heaps. The Vipera framework provides the underlying communications functions and module to support access to objects in the CPython heap on the host. The Olympus addressing model also allows compound objects to be placed on the stack, increasing performance by reducing the memory allocation overheads associated with allocating objects in the heap.
-
Arrays are encoded within the
ASTexpression type
field supported by the theSET_ARRAY()
,IS_ARRAY()
andGET_TYPE()
macros. ↩ -
We will refer to all variables (integers, floating point, strings, lists etc.) as objects. ↩
-
Although vPython does not currently support nested functions (inner functions), the Olympus abstract machine has full support for nested and anonymous functions. ↩