Based on these notes by Ben Lerner. Shortened/simplified a bit, changed the concrete syntax, and cut in two separate notes.
Last time, we considered the following miniscule language:
Our abstract syntax was essentially simply
type expr = int64
and our compiler simply placed that integer in the appropriate place in the assembly. But let’s clean up that code somewhat: for a given number (let’s say 4410), we generated the following assembly:
section .text global our_code_starts_here our_code_starts_here: mov RAX, 4410 ret
Of all of that code, only one line corresponds to our input program – the rest is scaffolding. Let’s refactor our compiler into two pieces, as follows:
type reg = | RAX (* the register where we place answers *) type arg = | Const of int64 (* explicit numeric constants *) | Reg of reg (* any named register *) type instruction = | IMov of arg * arg (* Move the value of the right-side arg into the left-arg *) let asm_to_string (asm : instruction list) : string = (* do something to get a string of assembly *) (* compile_expr is responsible for compiling just a single expression, and does not care about the surrounding scaffolding *) let compile_expr (e : expr) : instruction list = [ IMov(Reg(RAX), Const(e)) ] ;; (* compile_prog surrounds a compiled program by whatever scaffolding is needed *) let compile_prog (e : expr) : string = (* compile the program *) let instrs = compile_expr e in (* convert it to a textual form *) let asm_string = asm_to_string instrs in (* surround it with the necessary scaffolding *) let prelude = " section .text global our_code_starts_here our_code_starts_here:" in let suffix = "ret" in prelude ^ "\n" ^ asm_string ^ "\n" ^ suffix ;;
This is a bit more code than we previously had, but it’s much more usefully
compile_prog isn’t going to change 1For a little while!
The details of this function will get more elaborate, and we’ll actually wrap
this function in a larger pipeline, but the overall signature and purpose of
the function will remain unchanged., and
compile_expr will simply grow to
accomodate more elaborate expression forms.
Every time we enhance our source language, we need to consider several things:
Its impact on the concrete syntax of the language
Examples using the new enhancements, so we build intuition of them
Its impact on the abstract syntax and semantics of the language
Any new or changed transformations needed to process the new forms
Executable tests to confirm the enhancement works as intended
Let’s start by adding increment and decrement operations to our language.
These are not just example programs in the new language, but pairs of example programs and their intended behavior:
Based on the examples above, the semantics for
should be fairly obvious: they evaluate their argument to a number, and add or
subtract one from it.
The syntax is likewise trivially extended:
type expr = | Num of int64 | Add1 of expr | Sub1 of expr
To compile addition and subtraction, we need to enhance our knowledge of
assembly. We’ll introduce one new instruction:
add <dest>, <val> will
increment the destination by the right-side value. (This mutates the
destination, so if we still need the old value, we’ll need to have saved it
somewhere else, first.) We’ll correspondingly enhance our definition of
instruction to represent this new form:
type instruction = ... | IAdd of arg * arg (* Increment the left-hand arg by the value of the right-hand arg *)
Given this new instruction, work out the desired assembly for the examples above.
Let’s consider the second example:
(add1 42). To compile this, we
should load 42 into
RAX, and then add 1 to it. Or in symbols,
mov RAX, 42 add RAX, 1
The last example is similar: given
~hl:4:s~(sub1 (~hl:3:s~add1 (~hl:2:s~add1 ~hl:1:s~42~hl:1:e~)~hl:2:e~)~hl:3:e~)~hl:4:e~,
we want to
load 42, then add 1 to it, then add 1 to that, then subtract 1 from that
result. We currently only have
add, though, so we’ll add -1 instead of subtracting:
~hl:1:s~mov RAX, 42~hl:1:e~ ~hl:2:s~add RAX, 1~hl:2:e~ ~hl:3:s~add RAX, 1~hl:3:e~ ~hl:4:s~add RAX, -1~hl:4:e~
Notice how each piece of the input program corresponds to a related piece of the output assembly.
compile_expr function now looks like this:
let rec compile_expr (e : expr) : instruction list = match e with | Num n -> [ IMov(Reg(RAX), Const(n)) ] | Add1 e -> (* ??? *) | Sub1 e -> (* ??? *)
Try to complete this scaffolding yourself.
The key observation in the hand-written assembly above is that our translations
are compositional, that is, they recur on their subpieces, and a
translation of a composite expression is simply a function of the translations
of its pieces. Moreover, we know that constants always wind up in
add1 mutates in place, which means that our answers will always be in
RAX as desired. So our compiler for this language is
let rec compile_expr (e : expr) : instruction list = match e with | Num n -> [ IMov(Reg(RAX), Const(n)) ] | Add1 e -> (compile_expr e) @ [ IAdd(Reg(RAX), Const(1L)) ] | Sub1 e -> (compile_expr e) @ [ IAdd(Reg(RAX), Const(-1L)) ]
Run the given source programs through our compiler pipeline. It should give us exactly the handwritten assembly we intend. If not, debug the compiler until it does.
Extend this language with a new operation:
(double expr)should produce twice the value of the inner expression. Go through the five stages above: concrete syntax, examples, abstract syntax, transformation, and tests. Do we need any new features of the compiler pipeline, or of assembly, in order to achieve this? What if the operation were
There are other assembly instructions we could have added to our output language. The
subinstruction is the counterpart to
add, but performs subtraction instead. The
decinstructions specifically add or subtract
1. Enhance our definition of
instructionto include one or more of these new instructions, and modify
compile_expr(and any other functions necessary) to take advantage of them.
1For a little while! The details of this function will get more elaborate, and we’ll actually wrap this function in a larger pipeline, but the overall signature and purpose of the function will remain unchanged.