readme tweaks, mainly
This commit is contained in:
parent
3255cd32d7
commit
2288e47516
13 changed files with 177 additions and 84 deletions
|
@ -1,7 +1,9 @@
|
|||
all: out01 out02 README.html
|
||||
out01: in01
|
||||
../01/out00
|
||||
out02: out01
|
||||
out02: out01 in02
|
||||
./out01
|
||||
%.html: %.md ../markdown
|
||||
../markdown $<
|
||||
clean:
|
||||
rm -f out01 out02 README.html
|
||||
|
|
116
02/README.md
116
02/README.md
|
@ -1,13 +1,15 @@
|
|||
# stage 02
|
||||
|
||||
The compiler for this stage is in the file `in01`, an input for our previous compiler.
|
||||
The specifics of how this compiler works are in the comments in that file, but here I'll
|
||||
So if you run `../01/out00`, you'll get the file `out01`, which is
|
||||
this stage's compiler.
|
||||
The specifics of how this compiler works are in the comments in `in01`, but here I'll
|
||||
give an overview.
|
||||
Let's take a look at `in02`, an example input file for this compiler:
|
||||
```
|
||||
jm
|
||||
:-co jump to code
|
||||
::hw
|
||||
::hw start of hello world
|
||||
'H
|
||||
'e
|
||||
'l
|
||||
|
@ -23,11 +25,12 @@ jm
|
|||
'!
|
||||
\n
|
||||
::he end of hello world
|
||||
|
||||
|
||||
|
||||
::co start of code
|
||||
//
|
||||
// now we'll calculate the length of the hello world string
|
||||
// calculate the length of the hello world string
|
||||
// by subtracting hw from he.
|
||||
//
|
||||
im
|
||||
--he
|
||||
BA
|
||||
|
@ -36,7 +39,7 @@ im
|
|||
nA
|
||||
+B
|
||||
DA put length in rdx
|
||||
// okay now we can write it
|
||||
// okay now write it
|
||||
im
|
||||
##1.
|
||||
JA set rdi to 1 (stdout)
|
||||
|
@ -54,56 +57,123 @@ im
|
|||
sy
|
||||
```
|
||||
|
||||
You can try adding more characters to the hello world message, and it'll just work;
|
||||
the length of the text is computed automatically!
|
||||
We can compile it by running `./out01`. This will produce
|
||||
the executable `out02`, which you can run. It prints
|
||||
`Hello, world!`.
|
||||
|
||||
This time, commands are separated by newlines instead of semicolons.
|
||||
Each line begins with a 2-character command identifier. There are some special identifiers though:
|
||||
In this language,
|
||||
commands are separated by newlines instead of semicolons.
|
||||
Each line begins with a 2-character command.
|
||||
All of the commands from the previous compiler are here,
|
||||
plus six new ones:
|
||||
|
||||
- `::` marks a *label*
|
||||
- `--` outputs a label's (absolute) address
|
||||
- `:-` outputs a label's relative address
|
||||
- `##` outputs a number
|
||||
|
||||
All other commands work like they did in the previous compiler—if you scroll down in the
|
||||
`in01` source file, you'll see the full command table.
|
||||
- `//` is for comments
|
||||
- `\n\n` does nothing (used for spacing)
|
||||
|
||||
## labels
|
||||
|
||||
Labels are the most important new feature of this language.
|
||||
A line like
|
||||
```
|
||||
::xy
|
||||
```
|
||||
associates the name `xy` with the address of the next byte of the program.
|
||||
In the example program, `hw` is associated with `0x40007d`,
|
||||
which is the virtual memory address of the `Hello, world!` data.
|
||||
We can then use
|
||||
```
|
||||
--xy
|
||||
```
|
||||
to output that address, and
|
||||
```
|
||||
:-xy
|
||||
```
|
||||
to output it relative to the current address.
|
||||
So now instead of computing how far to jump, we can just jump to a label, e.g.
|
||||
```
|
||||
jm
|
||||
:-xy (use the relative address, because jumps are relative in x86-64)
|
||||
```
|
||||
And instead of figuring out the address of a piece of data, we can just use its label:
|
||||
```
|
||||
im
|
||||
--xy
|
||||
// rax now points to the data at the label "::xy"
|
||||
```
|
||||
|
||||
This also lets us compute the length of the hello world string automatically!
|
||||
By taking the address of the end of the string (`he`) and subtracting the
|
||||
start (`hw`), we get the length in bytes.
|
||||
So you can try adding more characters to the hello world message, and it'll just work.
|
||||
|
||||
All labels must be two ASCII characters. The address of each label is stored
|
||||
as a 32-bit number in the "label table". This is sort of like the command table—the
|
||||
index of the label `xy` is `128 * x + y`. Specifically, the entry for `xy` is at
|
||||
`0x420000 + 4 * (128 * x + y)`, since the label table starts at `0x420000`
|
||||
and each entry is 4 bytes.
|
||||
When we encounter `::xy`, we get the current position in the output file
|
||||
(using `lseek`), add the address of the start of the file (`0x400000`),
|
||||
and store that in the label table.
|
||||
When we encounter `:-xy` or `--xy`, we look up `xy` in the label table,
|
||||
and write the address (subtracting the current address for `:-`) to the output file.
|
||||
|
||||
## two passes?
|
||||
|
||||
This compiler actually needs to read through the source code,
|
||||
and output an executable, twice.
|
||||
This is because a label may be defined *after* it is used, e.g.:
|
||||
```
|
||||
jm
|
||||
:-aa jump forward
|
||||
...
|
||||
::aa this is where we're jumping to
|
||||
...
|
||||
```
|
||||
In the first pass, the `:-aa` will
|
||||
treat `aa` as having an address of 0. Then when
|
||||
we get to `::aa`, the address in the label table will be corrected.
|
||||
At the end of the first pass, we seek back to the start
|
||||
of the input and output files,
|
||||
and run the exact same code for the second pass.
|
||||
But this time, the correct address of `aa` is used, namely the
|
||||
one we calculated in the first pass.
|
||||
|
||||
|
||||
## other features
|
||||
|
||||
Now instead of writing out each of the 8 bytes making up a number,
|
||||
we can just write it in hexadecimal (e.g. `##3c.` for `3c 00 00 00 00 00 00 00`),
|
||||
and the compiler will automatically
|
||||
extend it to 8 bytes.
|
||||
we can just write it in hexadecimal, e.g. `##1c4.` for `c4 01 00 00 00 00 00 00`.
|
||||
This is especially nice because we don't need to write numbers backwards
|
||||
for little-endianness anymore!
|
||||
Numbers cannot appear at the end of a line (this was
|
||||
to make the compiler simpler to write), so I'm adding a `.` at the end of
|
||||
Numbers cannot appear at the end of a line (this made
|
||||
the compiler simpler to write), so I'm adding a `.` at the end of
|
||||
each one to avoid making that mistake.
|
||||
|
||||
Anything after a command is treated as a comment;
|
||||
additionally `//` can be used for comments on their own lines.
|
||||
I decided to implement them as simply as possible:
|
||||
I decided to implement this as simply as possible:
|
||||
I just added the command `//` to the command table, which outputs the byte `0x90`—this
|
||||
means "do nothing" (`nop`) in x86-64.
|
||||
Note that this means that the following code will not work as expected:
|
||||
means ["do nothing"](https://en.wikipedia.org/wiki/No-op)
|
||||
in x86-64.
|
||||
Note that the following code will not work as expected:
|
||||
```
|
||||
im
|
||||
// load the value 0x333 into rax
|
||||
##333.
|
||||
```
|
||||
since `0x90` gets inserted between the "load immediate" instruction code, and the immediate.
|
||||
since `0x90` gets inserted between the "load immediate" instruction code and the immediate.
|
||||
`\n\n` works identically, and lets us space out code a bit. But be careful:
|
||||
the number of blank lines must be a multiple of 3!
|
||||
|
||||
## limitations
|
||||
|
||||
Many of the limitations of our previous compilers apply to this one. Also,
|
||||
if you use a label without defining it, it uses address 0, rather than outputting
|
||||
an error message. This could be fixed: if the value in the label table is 0, and if we are
|
||||
an error message. This could be fixed: if the value in the label table is 0 and we are
|
||||
on the second pass, output an error message. This compiler was already tedious enough
|
||||
to implement, though!
|
||||
But thanks to labels, for future compilers at least we won't have to calculate
|
||||
|
|
24
02/in01
24
02/in01
|
@ -3,7 +3,7 @@
|
|||
;'i;'n;'0;'2;00 (0x40007d) input filename
|
||||
;'o;'u;'t;'0;'2;00 (0x400082) output filename
|
||||
;00;00;' ;'n;'o;'t;' ;'r;'e;'c;'o;'g;'n;'i;'z;'e;'d;\n;00;00;00;00;00;00 (0x400088) error message/where we read to
|
||||
;00 (0x4000a0) stores which pass we're on (1 for second pass)
|
||||
;00 (0x4000a0) stores which pass we're on (0 for first pass, 1 for second pass)
|
||||
;00;00;00;00;00;00;00
|
||||
;00;00;00;00;00;00;00;00 (0x4000a8) used for output
|
||||
unused padding
|
||||
|
@ -180,11 +180,11 @@ okay it's 0-9
|
|||
|
||||
;+B
|
||||
;BA
|
||||
okay we now have a digit in RBX
|
||||
okay we now have a digit in rbx
|
||||
;AR
|
||||
;<I;04
|
||||
;+B
|
||||
;RA store away in RBP
|
||||
;RA store away in rbp
|
||||
;jm;38;ff;ff;ff continue loop
|
||||
|
||||
unused padding
|
||||
|
@ -195,7 +195,7 @@ unused padding
|
|||
;00;00;00;00;00;00;00;00;00;00;00;00;00;00;00;00
|
||||
;00;00;00;00;00;00;00;00;00;00;00;00;00;00;00;00
|
||||
|
||||
okay we have a full number in RBP, time to write it to the file
|
||||
okay we have a full number in rbp, time to write it to the file.
|
||||
start by putting it at address 0x4000a8
|
||||
;im;a8;00;40;00;00;00;00;00
|
||||
;BA
|
||||
|
@ -210,7 +210,7 @@ now write
|
|||
;IA
|
||||
;im;08;00;00;00;00;00;00;00 write 8 bytes
|
||||
;DA
|
||||
;im;01;00;00;00;00;00;00;00 write
|
||||
;im;01;00;00;00;00;00;00;00 write
|
||||
;sy
|
||||
|
||||
;jm;c3;03;00;00 skip to newline
|
||||
|
@ -327,11 +327,11 @@ subtract current address
|
|||
;nA;+B
|
||||
;RA store relative address in rbp
|
||||
|
||||
now we want to write eax to the output file.
|
||||
now we want to write ebp to the output file.
|
||||
start by putting it at address 0x4000a8
|
||||
;im;a8;00;40;00;00;00;00;00
|
||||
;BA
|
||||
;AR put relative address in rax
|
||||
;AR
|
||||
;sd
|
||||
|
||||
now write
|
||||
|
@ -341,7 +341,7 @@ now write
|
|||
;IA
|
||||
;im;04;00;00;00;00;00;00;00 4 bytes
|
||||
;DA
|
||||
;im;01;00;00;00;00;00;00;00 write
|
||||
;im;01;00;00;00;00;00;00;00 write
|
||||
;sy
|
||||
|
||||
;jm;66;01;00;00 skip to newline
|
||||
|
@ -368,7 +368,7 @@ it's not a label or a number. let's look it up in the instruction table.
|
|||
;BA
|
||||
;RA store away address of command text in rbp
|
||||
;zA;lb
|
||||
;DA number of bytes to write (used for syscall if no error)
|
||||
;DA number of bytes to write (used for syscall if command exists)
|
||||
;BA
|
||||
;zA
|
||||
;cm;jn;54;00;00;00 check if # of bytes is 0, if not, skip outputting error
|
||||
|
@ -392,7 +392,7 @@ this is a real command
|
|||
;im;01;00;00;00;00;00;00;00 add 1 because we don't want to write the length
|
||||
;+B
|
||||
;IA address of data to write
|
||||
;im;04;00;00;00;00;00;00;00 out file descriptor
|
||||
;im;04;00;00;00;00;00;00;00 out file descriptor
|
||||
;JA
|
||||
;im;01;00;00;00;00;00;00;00 write
|
||||
;sy
|
||||
|
@ -1777,7 +1777,7 @@ the formatting changed appropriately.
|
|||
;00;00;00;00;00;00;00;00
|
||||
;00;00;00;00;00;00;00;00
|
||||
;00;00;00;00;00;00;00;00
|
||||
;00;00;00;00;00;00;00;00
|
||||
;01;90;00;00;00;00;00;00 \n\n
|
||||
;00;00;00;00;00;00;00;00
|
||||
;00;00;00;00;00;00;00;00
|
||||
;00;00;00;00;00;00;00;00
|
||||
|
@ -6550,7 +6550,7 @@ the formatting changed appropriately.
|
|||
;00;00;00;00;00;00;00;00
|
||||
;00;00;00;00;00;00;00;00
|
||||
;00;00;00;00;00;00;00;00
|
||||
;01;90;00;00;00;00;00;00
|
||||
;01;90;00;00;00;00;00;00 // comments
|
||||
;00;00;00;00;00;00;00;00
|
||||
;00;00;00;00;00;00;00;00
|
||||
;00;00;00;00;00;00;00;00
|
||||
|
|
11
02/in02
11
02/in02
|
@ -1,6 +1,6 @@
|
|||
jm
|
||||
:-co jump to code
|
||||
::hw
|
||||
::hw start of hello world
|
||||
'H
|
||||
'e
|
||||
'l
|
||||
|
@ -16,11 +16,12 @@ jm
|
|||
'!
|
||||
\n
|
||||
::he end of hello world
|
||||
|
||||
|
||||
|
||||
::co start of code
|
||||
//
|
||||
// now we'll calculate the length of the hello world string
|
||||
// calculate the length of the hello world string
|
||||
// by subtracting hw from he.
|
||||
//
|
||||
im
|
||||
--he
|
||||
BA
|
||||
|
@ -29,7 +30,7 @@ im
|
|||
nA
|
||||
+B
|
||||
DA put length in rdx
|
||||
// okay now we can write it
|
||||
// okay now write it
|
||||
im
|
||||
##1.
|
||||
JA set rdi to 1 (stdout)
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue