Adding unary operations
1 Recap and refactoring
2 Growing the language:   adding (and subtracting) 1
2.1 The new concrete syntax
2.2 Examples
2.3 Enhancing the abstract syntax
2.4 Enhancing the transformations
2.5 Testing
8.8

Adding unary operations

Based on these notes by Ben Lerner. Shortened/simplified a bit, changed the concrete syntax, and cut in two separate notes.

1 Recap and refactoring

Last time, we considered the following miniscule language:

‹expr›: NUMBER

Our abstract syntax was essentially simply

type expr = int64

and our compiler simply placed that integer in the appropriate place in the assembly. But let’s clean up that code somewhat: for a given number (let’s say 4410), we generated the following assembly:

section .text
global our_code_starts_here
our_code_starts_here:
  mov RAX, 4410
  ret

Of all of that code, only one line corresponds to our input program – the rest is scaffolding. Let’s refactor our compiler into two pieces, as follows:

type reg =
  | RAX (* the register where we place answers *)

type arg =
  | Const of int64 (* explicit numeric constants *)
  | Reg of reg (* any named register *)

type instruction =
  | IMov of arg * arg (* Move the value of the right-side arg into the left-arg *)

let asm_to_string (asm : instruction list) : string =
  (* do something to get a string of assembly *)

(* compile_expr is responsible for compiling just a single expression,
   and does not care about the surrounding scaffolding *)
let compile_expr (e : expr) : instruction list =
  [ IMov(Reg(RAX), Const(e)) ]
  ;;

(* compile_prog surrounds a compiled program by whatever scaffolding is needed *)
let compile_prog (e : expr) : string =
  (* compile the program *)
  let instrs = compile_expr e in
  (* convert it to a textual form *)
  let asm_string = asm_to_string instrs in
  (* surround it with the necessary scaffolding *)
  let prelude = "
section .text
global our_code_starts_here
our_code_starts_here:" in
  let suffix = "ret" in
  prelude ^ "\n" ^ asm_string ^ "\n" ^ suffix
  ;;

This is a bit more code than we previously had, but it’s much more usefully organized: compile_prog isn’t going to change 1For a little while! The details of this function will get more elaborate, and we’ll actually wrap this function in a larger pipeline, but the overall signature and purpose of the function will remain unchanged., and compile_expr will simply grow to accomodate more elaborate expression forms.

2 Growing the language: adding (and subtracting) 1

Every time we enhance our source language, we need to consider several things:

  1. Its impact on the concrete syntax of the language

  2. Examples using the new enhancements, so we build intuition of them

  3. Its impact on the abstract syntax and semantics of the language

  4. Any new or changed transformations needed to process the new forms

  5. Executable tests to confirm the enhancement works as intended

Let’s start by adding increment and decrement operations to our language.

2.1 The new concrete syntax

‹expr›: | NUMBER | ( add1 ‹expr› ) | ( sub1 ‹expr› )

2.2 Examples

These are not just example programs in the new language, but pairs of example programs and their intended behavior:

Concrete Syntax

     

Answer

42

     

42

(add1 42)

     

43

(sub1 42)

     

41

(sub1 (add1 (add1 42)))

     

43

2.3 Enhancing the abstract syntax

Based on the examples above, the semantics for add1 and sub1 should be fairly obvious: they evaluate their argument to a number, and add or subtract one from it.

The syntax is likewise trivially extended:

type expr =
  | Num of int64
  | Add1 of expr
  | Sub1 of expr

2.4 Enhancing the transformations

To compile addition and subtraction, we need to enhance our knowledge of assembly. We’ll introduce one new instruction: add <dest>, <val> will increment the destination by the right-side value. (This mutates the destination, so if we still need the old value, we’ll need to have saved it somewhere else, first.) We’ll correspondingly enhance our definition of instruction to represent this new form:

type instruction = ...
  | IAdd of arg * arg (* Increment the left-hand arg by the value of the right-hand arg *)

Do Now!

Given this new instruction, work out the desired assembly for the examples above.

Let’s consider the second example: (add1 42). To compile this, we should load 42 into RAX, and then add 1 to it. Or in symbols,

mov RAX, 42
add RAX, 1

The last example is similar: given ~hl:4:s~(sub1 (~hl:3:s~add1 (~hl:2:s~add1 ~hl:1:s~42~hl:1:e~)~hl:2:e~)~hl:3:e~)~hl:4:e~, we want to load 42, then add 1 to it, then add 1 to that, then subtract 1 from that result. We currently only have add, though, so we’ll add -1 instead of subtracting:

~hl:1:s~mov RAX, 42~hl:1:e~
~hl:2:s~add RAX, 1~hl:2:e~
~hl:3:s~add RAX, 1~hl:3:e~
~hl:4:s~add RAX, -1~hl:4:e~

Notice how each piece of the input program corresponds to a related piece of the output assembly.

Our compile_expr function now looks like this:

let rec compile_expr (e : expr) : instruction list =
  match e with
  | Num n  -> [ IMov(Reg(RAX), Const(n)) ]
  | Add1 e -> (* ??? *)
  | Sub1 e -> (* ??? *)

Do Now!

Try to complete this scaffolding yourself.

The key observation in the hand-written assembly above is that our translations are compositional, that is, they recur on their subpieces, and a translation of a composite expression is simply a function of the translations of its pieces. Moreover, we know that constants always wind up in RAX, and add1 mutates in place, which means that our answers will always be in RAX as desired. So our compiler for this language is

let rec compile_expr (e : expr) : instruction list =
  match e with
  | Num n  -> [ IMov(Reg(RAX), Const(n)) ]
  | Add1 e -> (compile_expr e) @ [ IAdd(Reg(RAX), Const(1L))  ]
  | Sub1 e -> (compile_expr e) @ [ IAdd(Reg(RAX), Const(-1L)) ]

2.5 Testing

Do Now!

Run the given source programs through our compiler pipeline. It should give us exactly the handwritten assembly we intend. If not, debug the compiler until it does.

Exercise

Extend this language with a new operation: (double expr) should produce twice the value of the inner expression. Go through the five stages above: concrete syntax, examples, abstract syntax, transformation, and tests. Do we need any new features of the compiler pipeline, or of assembly, in order to achieve this? What if the operation were (halve expr) instead?

Exercise

There are other assembly instructions we could have added to our output language. The sub instruction is the counterpart to add, but performs subtraction instead. The inc and dec instructions specifically add or subtract 1. Enhance our definition of instruction to include one or more of these new instructions, and modify compile_expr (and any other functions necessary) to take advantage of them.

1For a little while! The details of this function will get more elaborate, and we’ll actually wrap this function in a larger pipeline, but the overall signature and purpose of the function will remain unchanged.