111 lines
2.8 KiB
Markdown
111 lines
2.8 KiB
Markdown
|
# stage 02
|
||
|
|
||
|
The compiler for this stage is in the file `in01`, an input for our previous compiler.
|
||
|
The specifics of how this compiler works are in the comments in that file, but here I'll
|
||
|
give an overview.
|
||
|
Let's take a look at `in02`, an example input file for this compiler:
|
||
|
```
|
||
|
jm
|
||
|
:-co jump to code
|
||
|
::hw
|
||
|
'H
|
||
|
'e
|
||
|
'l
|
||
|
'l
|
||
|
'o
|
||
|
',
|
||
|
'
|
||
|
'w
|
||
|
'o
|
||
|
'r
|
||
|
'l
|
||
|
'd
|
||
|
'!
|
||
|
\n
|
||
|
::he end of hello world
|
||
|
::co start of code
|
||
|
//
|
||
|
// now we'll calculate the length of the hello world string
|
||
|
// by subtracting hw from he.
|
||
|
//
|
||
|
im
|
||
|
--he
|
||
|
BA
|
||
|
im
|
||
|
--hw
|
||
|
nA
|
||
|
+B
|
||
|
DA put length in rdx
|
||
|
// okay now we can write it
|
||
|
im
|
||
|
##1.
|
||
|
JA set rdi to 1 (stdout)
|
||
|
im
|
||
|
--hw
|
||
|
IA set rsi to a pointer to "Hello, world!\n"
|
||
|
im
|
||
|
##1. write
|
||
|
sy
|
||
|
im
|
||
|
##0. exit code 0
|
||
|
JA
|
||
|
im
|
||
|
##3c. exit = syscall 0x3c
|
||
|
sy
|
||
|
```
|
||
|
|
||
|
You can try adding more characters to the hello world message, and it'll just work;
|
||
|
the length of the text is computed automatically!
|
||
|
|
||
|
This time, commands are separated by newlines instead of semicolons.
|
||
|
Each line begins with a 2-character command identifier. There are some special identifiers though:
|
||
|
|
||
|
- `::` marks a *label*
|
||
|
- `--` outputs a label's (absolute) address
|
||
|
- `:-` outputs a label's relative address
|
||
|
- `##` outputs a number
|
||
|
|
||
|
All other commands work like they did in the previous compiler—if you scroll down in the
|
||
|
`in01` source file, you'll see the full command table.
|
||
|
|
||
|
## labels
|
||
|
|
||
|
Labels are the most important new feature of this language.
|
||
|
|
||
|
## two passes?
|
||
|
|
||
|
## other features
|
||
|
|
||
|
Now instead of writing out each of the 8 bytes making up a number,
|
||
|
we can just write it in hexadecimal (e.g. `##3c.` for `3c 00 00 00 00 00 00 00`),
|
||
|
and the compiler will automatically
|
||
|
extend it to 8 bytes.
|
||
|
This is especially nice because we don't need to write numbers backwards
|
||
|
for little-endianness anymore!
|
||
|
Numbers cannot appear at the end of a line (this was
|
||
|
to make the compiler simpler to write), so I'm adding a `.` at the end of
|
||
|
each one to avoid making that mistake.
|
||
|
|
||
|
Anything after a command is treated as a comment;
|
||
|
additionally `//` can be used for comments on their own lines.
|
||
|
I decided to implement them as simply as possible:
|
||
|
I just added the command `//` to the command table, which outputs the byte `0x90`—this
|
||
|
means "do nothing" (`nop`) in x86-64.
|
||
|
Note that this means that the following code will not work as expected:
|
||
|
```
|
||
|
im
|
||
|
// load the value 0x333 into rax
|
||
|
##333.
|
||
|
```
|
||
|
since `0x90` gets inserted between the "load immediate" instruction code, and the immediate.
|
||
|
|
||
|
## limitations
|
||
|
|
||
|
Many of the limitations of our previous compilers apply to this one. Also,
|
||
|
if you use a label without defining it, it uses address 0, rather than outputting
|
||
|
an error message. This could be fixed: if the value in the label table is 0, and if we are
|
||
|
on the second pass, output an error message. This compiler was already tedious enough
|
||
|
to implement, though!
|
||
|
But thanks to labels, for future compilers at least we won't have to calculate
|
||
|
any jump offsets manually.
|