.. | ||
ex03 | ||
in02 | ||
in03 | ||
Makefile | ||
README.md |
stage 03
The code for this compiler (in the file in02
, an input for our stage 02 compiler)
is 2700 lines—quite a bit longer than the previous ones.
To compile it, run ../02/out01
from this directory.
Let's take a look at in03
, the example program I've written for it:
B=:hello_world
call :puts
; exit code 0
J=d0
syscall x3c
:hello_world
str Hello, world!
xa
x0
; output null-terminated string in rbx
:puts
R=B
call :strlen
D=A
I=R
J=d1
syscall d1
return
; calculate length of string in rbx
:strlen
; keep pointer to start of string
D=B
I=B
:strlen_loop
C=1I
?C=0:strlen_loop_end
I+=d1
!:strlen_loop
:strlen_loop_end
I-=D
A=I
return
This language looks a lot nicer than the previous one. No more obscure two-letter label names
and commands! Furthermore, try changing :strlen_loop
on line 31
to a typo like :strlen_lop
. You should get:
Bad label 001f
Not only do we get an error message, we also get the line number of the error! It's in hexadecimal, unfortunately, but that's better than nothing.
I spent a while on this compiler (perhaps I went a bit overboard on the features), because for the 02 language was the first that was actually pleasant to use! It's much less sophisticated than even most assembly languages, but being able to use labels without having to worry about filling in the offsets later made it way nicer to use than the previous languages.
In addition to in03
, this directory also has ex03
,
which gives examples of all of the instructions supported by this compiler.
Seeing as this is a relatively large compiler, here's an overview of how it works:
functions
Thanks to labels, we can actually use functions in this compiler, without it being a complete nightmare. Functions are called like this:
im
--fu
cl (this would call the function ::fu)
and at the end of each function, we get re
, which returns from the function.
I've used the convention of storing return values in rax
and
passing the argument to a unary function in rbx
.
This compiler ended up having a lot of functions, some of them used in all sorts of different places.
execution
Just as with the 02 compiler, we need two passes: the first one computes the address of each label, and the second one uses the correct addresses to write the executable.
Each pass is a loop, which starts by incrementing
the line number (::L#
). Then we read in a line
from the source file, in03
. This is done one character
at a time, until a newline is reached. The line is stored
in the buffer ::LI
. In the remainder of the program we
(mostly) use the fact that the line is newline-terminated,
rather than keeping track of how long it is.
Once the line is read in, a bunch of tests are performed on it.
We start by looking at the first character: if it's a ;
,
the line is a comment; if it's a !
, it's an unconditional jump; etc.
Failing that, we look at the second character, to see if it's
=
, +=
, -=
, etc. If it doesn't match any of them, we use
the ::s=
(string equals) function, which conveniently lets you
set the terminator. We check if the line is equal to "syscall"
up to a terminator of ' '
(space) to check if it's a syscall, for example.
+=
, et al.
We can emit the correct instruction for D+=C
with:
mov rbx, rdx
mov rax, rcx
add rax, rbx
mov rdx, rax
A similar pattern can be used for -=
, &=
, etc.
This made it pretty easy to write the implementation of all of these:
there's one function for setting rbx
to the first operand (::B1
),
another for setting rax
to the second operand (::A2
), and another for
setting the first operand to rax
(::1A
). The implementations of
+=
/-=
/etc. just call those three functions, with a bit of stuff in between
to perform the corresponding operation.
A similar approach also works for loading/storing values in memory.
label list
Instead of a label table, we now have a "label list" (or array
if you prefer) at ::LB
.
A pointer to the current end of the list is stored at ::L$
.
Each entry is the name of the label, including the :
, then a newline,
then the 4-byte address.
::ll
is used to look up labels. If it's the first pass,
::ll
just returns 0. Otherwise, it looks up the label by
comparing it to each entry using s=
with a terminator of '\n'
.
If no label matches, we get an error.
alignment
A lot of data used in this program is
not correctly aligned—e.g.
8-byte values are not always stored at an address that is a multiple of 8.
This would be a problem on some processors, but x86-64 can handle it.
It's still not a good idea in practice—reading unaligned memory
is much slower. But we're not really concerned about performance here,
and it would be a bit finnicky to align everything correctly.
However, I have introduced align
into this language,
which you can put before a label to ensure that its address is aligned
to 8 bytes.
errors
Errors are handled in functions beginning with !
, e.g. ::!n
for "bad number".
Each of these ends up calling ::er
. ::er
prints
a string specific to the type of error, then
converts the line number to a string, and prints it.
The line number is always converted to a 4-digit hexadecimal number.
This means it won't fully work past 65,535 lines, but
let's hope we don't need to write any programs that long!
limitations
Functions in this 03 language will probably overwrite the previous values of registers. This can make it kind of annoying to call functions, since you need to make sure you store away any information you'll need after the function. And the language definitely won't be as nice to use as something with real variables. But overall, I'm very happy with this compiler, especially considering it's written in a language with 2-letter label names.