03 README
This commit is contained in:
parent
f7f1f10cb0
commit
7bb8ab02f7
8 changed files with 263 additions and 46 deletions
168
03/README.md
Normal file
168
03/README.md
Normal file
|
@ -0,0 +1,168 @@
|
|||
# stage 03
|
||||
The code for this compiler (the file `in02`, an input for our [stage 02 compiler](../02/README.md))
|
||||
is 2700 lines—quite a bit larger than the previous ones. And as we'll see, it's a lot more powerful too.
|
||||
To compile it, run `../02/out01` from this directory.
|
||||
Let's take a look at `in03`, the example program I've written for it:
|
||||
```
|
||||
B=:hello_world
|
||||
call :puts
|
||||
; exit code 0
|
||||
J=d0
|
||||
syscall x3c
|
||||
|
||||
:hello_world
|
||||
str Hello, world!
|
||||
xa
|
||||
x0
|
||||
|
||||
; output null-terminated string in rbx
|
||||
:puts
|
||||
R=B
|
||||
call :strlen
|
||||
D=A
|
||||
I=R
|
||||
J=d1
|
||||
syscall d1
|
||||
return
|
||||
|
||||
; calculate length of string in rbx
|
||||
:strlen
|
||||
; keep pointer to start of string
|
||||
D=B
|
||||
I=B
|
||||
:strlen_loop
|
||||
C=1I
|
||||
?C=0:strlen_loop_end
|
||||
I+=d1
|
||||
!:strlen_loop
|
||||
:strlen_loop_end
|
||||
I-=D
|
||||
A=I
|
||||
return
|
||||
```
|
||||
This language looks a lot nicer than the previous one. No more obscure two-letter label names
|
||||
and commands! Furthermore, try changing `:strlen_loop` on line 31
|
||||
to a typo like `:strlen_lop`. You should get:
|
||||
```
|
||||
Bad label 001f
|
||||
```
|
||||
Not only do we get an error message, we also get the line number
|
||||
of the error! It's in hexadecimal, unfortunately, but that's
|
||||
better than nothing.
|
||||
|
||||
I spent a while on this compiler (perhaps I went a bit overboard
|
||||
on the features), because for the 02 language
|
||||
was the first that was actually pleasant to use!
|
||||
It's much less sophisticated than even most assembly languages,
|
||||
but being able to use labels without having to worry about filling
|
||||
in the offsets later made it way nicer to use than the previous
|
||||
languages.
|
||||
|
||||
In addition to `in03`, this directory also has `ex03`,
|
||||
which gives examples of all of the instructions supported by this compiler.
|
||||
|
||||
Seeing as this is a relatively large compiler,
|
||||
here is an overview of how it works:
|
||||
|
||||
## functions
|
||||
|
||||
Thanks to labels, we can actually use functions in this compiler, without
|
||||
it being a complete nightmare. Functions are called like this:
|
||||
```
|
||||
im
|
||||
--fu
|
||||
cl (this would call the function ::fu)
|
||||
```
|
||||
and at the end of each function, we get `re`, which returns from the function.
|
||||
I've used the convention of storing return values in `rax` and
|
||||
passing the argument to a unary function in `rbx`.
|
||||
|
||||
This compiler ended up having a lot of functions, some of them used in all sorts
|
||||
of different places.
|
||||
|
||||
## execution
|
||||
|
||||
Just as with the 02 compiler, we need two passes:
|
||||
the first one
|
||||
computes the address of each label,
|
||||
and the second one uses the correct addresses to
|
||||
write the executable.
|
||||
|
||||
Each pass is a loop, which starts by incrementing
|
||||
the line number (`::L#`). Then we read in a line
|
||||
from the source file, `in03`. This is done one character
|
||||
at a time, until a newline is reached. The line is stored
|
||||
in the buffer `::LI`. In the remainder of the program we
|
||||
(mostly) use the fact that the line is newline-terminated,
|
||||
rather than keeping track of how long it is.
|
||||
|
||||
Once the line is read in, a bunch of tests are performed on it.
|
||||
We start by looking at the first character: if it's a `;`,
|
||||
the line is a comment; if it's a `!`, it's an unconditional jump; etc.
|
||||
Failing that, we look at the second character, to see if it's
|
||||
`=`, `+=`, `-=`, etc. If it doesn't match any of them, we use
|
||||
the `::s=` (string equals) function, which conveniently lets you
|
||||
set the terminator. We check if the line is equal to `"syscall"`
|
||||
up to a terminator of `' '` to check if it's a syscall, for example.
|
||||
|
||||
## `+=`, et al.
|
||||
|
||||
We can emit the correct instruction for `D+=C` with:
|
||||
|
||||
- `mov rbx, rdx`
|
||||
- `mov rax, rcx`
|
||||
- `add rax, rbx`
|
||||
- `mov rdx, rax`
|
||||
|
||||
A similar pattern can be used for `-=`, `&=`, etc.
|
||||
This made it pretty easy to write the implementation of all of these:
|
||||
there's one function for setting `rbx` to the first operand (`::B1`),
|
||||
another for setting `rax` to the second operand (`::A2`), and another for
|
||||
setting the first operand to `rax` (`::1A`). The implementations of
|
||||
`+=`/`-=`/etc. just call those three functions, with a bit of stuff in between
|
||||
to perform the corresponding operation.
|
||||
A similar approach also works for loading/storing values in memory.
|
||||
|
||||
## label list
|
||||
|
||||
Instead of a label table, we now have a "label list" (or array
|
||||
if you prefer) at `::LB`.
|
||||
A pointer to the current end of the list is stored at `::L$`.
|
||||
Each entry is the name of the label, including the `:`, then a newline,
|
||||
then the 4-byte address.
|
||||
`::ll` is used to look up labels. If it's the first pass,
|
||||
`::ll` just returns 0. Otherwise, it looks up the label by
|
||||
comparing it to each entry using `s=` with a terminator of `'\n'`.
|
||||
If no label matches, we get an error.
|
||||
|
||||
## alignment
|
||||
A lot of data used in this program is
|
||||
[not correctly aligned](https://en.wikipedia.org/wiki/Bus_error#Unaligned_access)—e.g.
|
||||
8-byte values are not always stored at an address that is a multiple of 8.
|
||||
This would be a problem on some processors, but x86-64 can handle it.
|
||||
It's still not a good idea in practice—reading unaligned memory
|
||||
is much slower. But we're not really concerned about performance here,
|
||||
and it would be a bit finnicky to align everything correctly.
|
||||
However, I have introduced `align` into this language,
|
||||
which you can put before a label to ensure that its address is aligned
|
||||
to 8 bytes.
|
||||
|
||||
## errors
|
||||
|
||||
Errors are handled in functions beginning with `!`, e.g. `::!n` for "bad number".
|
||||
Each of these ends up calling `::er`. `::er` prints
|
||||
a string specific to the type of error, then
|
||||
converts the line number to a string, and prints it.
|
||||
The line number is always converted to a 4-digit hexadecimal number.
|
||||
This means it won't fully work past 65,535 lines, but
|
||||
let's hope we don't need to write any programs that long!
|
||||
|
||||
## limitations
|
||||
|
||||
Functions in this 03 language will probably overwrite the previous values
|
||||
of registers. This can make it kind of annoying to call functions, since
|
||||
you need to make sure you store away any information you'll need after the function.
|
||||
And the language definitely won't be as nice to use as something with real variables. But overall,
|
||||
I'm very happy with this compiler, considering it's written in a language with 2-letter label
|
||||
names.
|
||||
|
Loading…
Add table
Add a link
Reference in a new issue