readme tweaks, mainly
This commit is contained in:
parent
3255cd32d7
commit
2288e47516
13 changed files with 177 additions and 84 deletions
|
@ -3,3 +3,5 @@ out00: in00
|
|||
./hexcompile
|
||||
%.html: %.md ../markdown
|
||||
../markdown $<
|
||||
clean:
|
||||
rm -f out00 README.html
|
||||
|
|
26
00/README.md
26
00/README.md
|
@ -102,7 +102,7 @@ execute-enabled. Normally people don't do this, for security, but we won't worry
|
|||
about that (don't compile any untrusted code with any compiler from this series!)
|
||||
Without further ado, here's the contents of the program header:
|
||||
|
||||
- `01 00 00 00` Segment type 1 (this should be loaded into memory)
|
||||
- `01 00 00 00` Segment type 1 (this segment should be loaded into memory)
|
||||
- `07 00 00 00` Flags = RWE (readable, writeable, and executable)
|
||||
- `78 00 00 00 00 00 00 00` Offset in file = 120 bytes
|
||||
- `78 00 40 00 00 00 00 00` Virtual address = 0x400078
|
||||
|
@ -114,7 +114,7 @@ memory address that the segment will be loaded to.
|
|||
Nowadays, computers use virtual memory, meaning that
|
||||
addresses in our program don't actually correspond to where the memory is
|
||||
physically stored in RAM (the CPU translates between virtual and physical
|
||||
memory addresses). There are many reasons for this: making sure each process has
|
||||
addresses). There are many reasons for this: making sure each process has
|
||||
its own memory space, memory protection, etc. You can read more about it
|
||||
elsewhere.
|
||||
|
||||
|
@ -130,7 +130,7 @@ each page (block) of memory is 4096 bytes long, and has to start at an address
|
|||
that is a multiple of 4096. Our program needs to be loaded into a memory page,
|
||||
so its *virtual address* needs to be a multiple of 4096. We're using `0x400000`.
|
||||
But wait! Didn't we use `0x400078` for the virtual address? Well, yes but that's
|
||||
because the *data in the file* is loaded to address `0x400078`. The actual page
|
||||
because the segment's data is loaded to address `0x400078`. The actual page
|
||||
of memory that the OS will allocate for our segment will start at `0x400000`. The
|
||||
reason we need to start `0x78` bytes in is that Linux expects the data in the
|
||||
file to be at the same position in the page as when it will be loaded, and it
|
||||
|
@ -156,7 +156,8 @@ These instructions execute syscall `2` with arguments `0x40026d`, `0`.
|
|||
If you're familiar with C code, this is `open("in00", O_RDONLY)`.
|
||||
A syscall is the mechanism which lets software ask the kernel to do things.
|
||||
[Here](https://filippo.io/linux-syscall-table/) is a nice table of syscalls you
|
||||
can look through if you're interested. You can also install `strace` (e.g. with
|
||||
can look through if you're interested. You can also install
|
||||
[strace](https://strace.io) (e.g. with
|
||||
`sudo apt install strace`) and run `strace ./hexcompile` to see all the syscalls
|
||||
our program does.
|
||||
Syscall #2, on 64-bit Linux, is `open`. It's used to open a file. You can read
|
||||
|
@ -175,13 +176,13 @@ descriptor Linux gave us. This is because Linux assigns file descriptor numbers
|
|||
sequentially, starting from
|
||||
[0 for stdin, 1 for stdout, 2 for stderr](https://en.wikipedia.org/wiki/Standard_streams),
|
||||
and then 3, 4, 5, ... for any files our program opens. So
|
||||
this file, the first one our program opens, will have descriptor `3`.
|
||||
this file, the first one our program opens, will have descriptor 3.
|
||||
|
||||
Now we open our output file:
|
||||
|
||||
- `48 b8 72 02 40 00 00 00 00 00` `mov rax, 0x400272`
|
||||
- `48 89 c7` `mov rdi, rax`
|
||||
- `48 b8 41 02 00 00 00 00 00 00` `mov rax, 0x41`
|
||||
- `48 b8 41 02 00 00 00 00 00 00` `mov rax, 0x241`
|
||||
- `48 89 c6` `mov rsi, rax`
|
||||
- `48 b8 ed 01 00 00 00 00 00 00` `mov rax, 0o755`
|
||||
- `48 89 c2` `mov rdx, rax`
|
||||
|
@ -193,11 +194,12 @@ similar to our first call, with two important differences: first, we specify
|
|||
`0x241` as the second argument. This tells Linux that we are writing to the
|
||||
file (`O_WRONLY = 0x01`), that we want to create it if it doesn't exist
|
||||
(`O_CREAT = 0x40`), and that we want to delete any previous contents it had
|
||||
(`O_TRUNC = 0x200`). Secondly, we are setting the third argument this time. It
|
||||
(`O_TRUNC = 0x200`). Secondly, we're setting the third argument this time. It
|
||||
specifies the permissions our file is created with (`0o755` means user
|
||||
read/write/execute, group/other read/execute). This is not very important to
|
||||
the actual execution of the program, so don't worry if you don't know
|
||||
about UNIX permissions.
|
||||
Note that the output file's descriptor will be 4.
|
||||
|
||||
Now we can start reading from the file. We're going to loop back to this part of
|
||||
the code every time we want to read a new hexadecimal number from the input
|
||||
|
@ -223,13 +225,13 @@ We're telling Linux to output to `0x40026a`, which is just a part of this
|
|||
segment (see further down). Normally you would read to a different segment of
|
||||
the program from where the code is, but we want this to be as simple as
|
||||
possible.
|
||||
The number of bytes *actually read*, taking into account that we might have
|
||||
The number of bytes *actually* read, taking into account that we might have
|
||||
reached the end of the file, is stored in `rax`.
|
||||
|
||||
- `48 89 c3` `mov rbx, rax`
|
||||
- `48 b8 03 00 00 00 00 00 00 00` `mov rax, 3`
|
||||
- `48 39 d8` `cmp rax, rbx`
|
||||
- `0f 8f 50 01 00 00` `jg 0x400250`
|
||||
- `0f 8f 50 01 00 00` `jg +0x150 (0x400250)`
|
||||
|
||||
This tells the CPU to jump to a later part of the code (address `0x400250`) if 3
|
||||
is greater than the number of bytes we got, in other words, if we reached the
|
||||
|
@ -307,7 +309,7 @@ Okay, now `rax` contains the byte specified by the two hex digits we read.
|
|||
- `48 93` `xchg rax, rbx`
|
||||
- `88 03` `mov byte [rbx], al`
|
||||
|
||||
Write the byte to a specific memory location (address `0x40026c`).
|
||||
Put the byte in a specific memory location (address `0x40026c`).
|
||||
|
||||
- `48 b8 04 00 00 00 00 00 00 00` `mov rax, 4`
|
||||
- `48 89 c7` `mov rdi, rax`
|
||||
|
@ -356,7 +358,7 @@ This is where we conditionally jumped to way back when we determined if we
|
|||
reached the end of the file. This calls syscall #60, `exit`, with one argument,
|
||||
0 (exit code 0, indicating we exited successfully).
|
||||
|
||||
Normally, you should close files descriptors (with syscall #3), to tell Linux you're
|
||||
Normally, you would close files descriptors (with syscall #3), to tell Linux you're
|
||||
done with them, but we don't need to. It'll automatically close all our open
|
||||
file descriptors when our program exits.
|
||||
|
||||
|
@ -387,4 +389,4 @@ a while.
|
|||
But these problems aren't really a big deal. We'll only be running this on
|
||||
little programs and we'll be sure to check that our input is in the right
|
||||
format. And with that, we are ready to move on to the
|
||||
[next stage...](../01/README.md).
|
||||
[next stage...](../01/README.md)
|
||||
|
|
|
@ -5,3 +5,5 @@ out00: in00
|
|||
../00/hexcompile
|
||||
%.html: %.md ../markdown
|
||||
../markdown $<
|
||||
clean:
|
||||
rm -f out00 out01 README.html
|
||||
|
|
32
01/README.md
32
01/README.md
|
@ -8,7 +8,7 @@ is the executable for this stage's compiler. Run it (it'll read from the file
|
|||
`Hello, world!` when run. Let's take a look at the input we're providing to the
|
||||
stage 01 compiler, `in01`:
|
||||
|
||||
<pre><code>
|
||||
```
|
||||
|| ELF Header
|
||||
;im;01;00;00;00;00;00;00;00 file descriptor for stdout
|
||||
;JA
|
||||
|
@ -24,9 +24,9 @@ stage 01 compiler, `in01`:
|
|||
;sy
|
||||
;'H;'e;'l;'l;'o;',;' ;'w;'o;'r;'l;'d;'!;\n the string we're printing
|
||||
;
|
||||
</code></pre>
|
||||
```
|
||||
|
||||
Look at that! There are comments! Much nicer than just hexadecimal digit pairs.
|
||||
Look at that! There are even comments! Much nicer than just hexadecimal digit pairs.
|
||||
|
||||
## end result
|
||||
|
||||
|
@ -50,9 +50,9 @@ actually print out an error message and exit, rather than continuing as if
|
|||
nothing happened! Try adding `xx;` to the end of the file `in01`, and running
|
||||
`./out00`. You should get the error message:
|
||||
|
||||
<pre><code>
|
||||
```
|
||||
xx not recognized.
|
||||
</code></pre>
|
||||
```
|
||||
|
||||
Pretty cool, huh?
|
||||
Anyways let's see how this compiler actually works.
|
||||
|
@ -63,7 +63,7 @@ Writing in our stage 00 language is much nicer than editing an
|
|||
executable, because it's easier to move things around, and also, we can separate
|
||||
our program into lines! Let's take a look at the start:
|
||||
|
||||
<pre><code>
|
||||
```
|
||||
7f 45 4c 46
|
||||
02
|
||||
01
|
||||
|
@ -90,7 +90,7 @@ a8 00 40 00 00 00 00 00
|
|||
00 10 02 00 00 00 00 00
|
||||
00 10 02 00 00 00 00 00
|
||||
00 10 00 00 00 00 00 00
|
||||
</code></pre>
|
||||
```
|
||||
|
||||
This is the ELF header and program header. It's just like our last one, but with
|
||||
a couple of differences. First, our entry point is at offset 0xa8 instead of 0x78.
|
||||
|
@ -113,7 +113,7 @@ recognized."`
|
|||
- `00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00` (unused)
|
||||
|
||||
Here's the data for our program. As you can see from my annotations, we have the
|
||||
input and output file, as well as the error message. The command part of the
|
||||
input and output file names, as well as the error message. The command part of the
|
||||
error message is left blank for now (we'll fill it in when the code is actually
|
||||
run).
|
||||
|
||||
|
@ -182,8 +182,8 @@ program with exit code 0 (successful).
|
|||
- `48 01 d8` `add rax, rbx`
|
||||
|
||||
This here looks at the two bytes we read in (we'll call them `b1` and `b2`) and
|
||||
computes `b1 * 128 + b2` (more specifically `(b1 << 7) + b2`). This is the index
|
||||
in our command table corresponding to the two characters from the input file.
|
||||
computes `b1 * 128 + b2` (more specifically `(b1 << 7) + b2`). This is the corresponding index
|
||||
in our command table.
|
||||
|
||||
- `48 c1 e0 03` `shl rax, 3`
|
||||
- `48 89 c3` `mov rbx, rax`
|
||||
|
@ -211,7 +211,7 @@ is `03 48 89 c3`. We set the length to 0 for unused entries.
|
|||
So this code checks if the entry for this command starts with a zero byte. If it
|
||||
does, that means the two characters we read in don't actually correspond to a
|
||||
real command. If that's the case, this next bit of code is executed (otherwise
|
||||
it's skiped over):
|
||||
it's skipped over):
|
||||
|
||||
- `48 b8 02 00 00 00 00 00 00 00` `mov rax, 2 (stderr)`
|
||||
- `48 89 c7` `mov rdi, rax`
|
||||
|
@ -228,7 +228,7 @@ it's skiped over):
|
|||
- `00 00 00 00 00 00 00 00 00 00 00 00 00 00` (unused)
|
||||
|
||||
This prints our error message, now filled in with the specific unrecognized
|
||||
instruction, to standard error, and exits with code 1, to indicate failure.
|
||||
instruction, to standard error, then exits with code 1, to indicate failure.
|
||||
|
||||
- `48 89 eb` `mov rbx, rax`
|
||||
- `31 c0` `mov rax, 0`
|
||||
|
@ -273,7 +273,7 @@ all the way back to read the next command. Otherwise, we keep looping. This
|
|||
skips over any comments/whitespace we might have between a command and the
|
||||
following command.
|
||||
|
||||
And that's all the *code* for this compiler. Next comes some data.
|
||||
And that's all the *code* for this compiler. Next comes the command table.
|
||||
|
||||
First, there's a whole bunch of unused 0s. Then there's the line
|
||||
|
||||
|
@ -293,7 +293,7 @@ Which is the encoding of the `syscall` instruction.
|
|||
You can look through the rest of the table, if you want. But let's look at the
|
||||
very end:
|
||||
|
||||
<code><pre>
|
||||
```
|
||||
78
|
||||
7f 45 4c 46
|
||||
02
|
||||
|
@ -321,7 +321,7 @@ very end:
|
|||
00 00 08 00 00 00 00 00
|
||||
00 00 08 00 00 00 00 00
|
||||
00 10 00 00 00 00 00 00
|
||||
</code></pre>
|
||||
```
|
||||
|
||||
This is at the position for `||`, and it contains an ELF header. One thing you
|
||||
might notice is that we decided that each entry is 8 bytes long, but this one is
|
||||
|
@ -340,5 +340,5 @@ fixed this, but frankly I've had enough of writing code in hexadecimal. So let's
|
|||
move on to [stage 02](../02/README.md),
|
||||
now that we have a nicer language on our hands. From now
|
||||
on, since we have comments, I'm gonna do most of the explaining in the source file
|
||||
itself, rather than the README. But there'll still be a bit of stuff there each
|
||||
itself, rather than the README. But there'll still be some stuff there each
|
||||
time.
|
||||
|
|
|
@ -7,11 +7,12 @@ ff - Byte ff
|
|||
'a - Character a (byte 0x61)
|
||||
'! - Character ! (byte 0x21)
|
||||
etc.
|
||||
\n - Newline (byte 0x0a)
|
||||
|
||||
zA - Zero rax
|
||||
im - Set rax to an immediate value, e.g.
|
||||
im;05;00;00;00;00;00;00;00;
|
||||
will set rax to 5.
|
||||
im;05;00;00;00;00;00;00;00;
|
||||
will set rax to 5.
|
||||
|
||||
ax bx cx dx sp bp si di
|
||||
A B C D S R I J
|
||||
|
|
|
@ -1,7 +1,9 @@
|
|||
all: out01 out02 README.html
|
||||
out01: in01
|
||||
../01/out00
|
||||
out02: out01
|
||||
out02: out01 in02
|
||||
./out01
|
||||
%.html: %.md ../markdown
|
||||
../markdown $<
|
||||
clean:
|
||||
rm -f out01 out02 README.html
|
||||
|
|
116
02/README.md
116
02/README.md
|
@ -1,13 +1,15 @@
|
|||
# stage 02
|
||||
|
||||
The compiler for this stage is in the file `in01`, an input for our previous compiler.
|
||||
The specifics of how this compiler works are in the comments in that file, but here I'll
|
||||
So if you run `../01/out00`, you'll get the file `out01`, which is
|
||||
this stage's compiler.
|
||||
The specifics of how this compiler works are in the comments in `in01`, but here I'll
|
||||
give an overview.
|
||||
Let's take a look at `in02`, an example input file for this compiler:
|
||||
```
|
||||
jm
|
||||
:-co jump to code
|
||||
::hw
|
||||
::hw start of hello world
|
||||
'H
|
||||
'e
|
||||
'l
|
||||
|
@ -23,11 +25,12 @@ jm
|
|||
'!
|
||||
\n
|
||||
::he end of hello world
|
||||
|
||||
|
||||
|
||||
::co start of code
|
||||
//
|
||||
// now we'll calculate the length of the hello world string
|
||||
// calculate the length of the hello world string
|
||||
// by subtracting hw from he.
|
||||
//
|
||||
im
|
||||
--he
|
||||
BA
|
||||
|
@ -36,7 +39,7 @@ im
|
|||
nA
|
||||
+B
|
||||
DA put length in rdx
|
||||
// okay now we can write it
|
||||
// okay now write it
|
||||
im
|
||||
##1.
|
||||
JA set rdi to 1 (stdout)
|
||||
|
@ -54,56 +57,123 @@ im
|
|||
sy
|
||||
```
|
||||
|
||||
You can try adding more characters to the hello world message, and it'll just work;
|
||||
the length of the text is computed automatically!
|
||||
We can compile it by running `./out01`. This will produce
|
||||
the executable `out02`, which you can run. It prints
|
||||
`Hello, world!`.
|
||||
|
||||
This time, commands are separated by newlines instead of semicolons.
|
||||
Each line begins with a 2-character command identifier. There are some special identifiers though:
|
||||
In this language,
|
||||
commands are separated by newlines instead of semicolons.
|
||||
Each line begins with a 2-character command.
|
||||
All of the commands from the previous compiler are here,
|
||||
plus six new ones:
|
||||
|
||||
- `::` marks a *label*
|
||||
- `--` outputs a label's (absolute) address
|
||||
- `:-` outputs a label's relative address
|
||||
- `##` outputs a number
|
||||
|
||||
All other commands work like they did in the previous compiler—if you scroll down in the
|
||||
`in01` source file, you'll see the full command table.
|
||||
- `//` is for comments
|
||||
- `\n\n` does nothing (used for spacing)
|
||||
|
||||
## labels
|
||||
|
||||
Labels are the most important new feature of this language.
|
||||
A line like
|
||||
```
|
||||
::xy
|
||||
```
|
||||
associates the name `xy` with the address of the next byte of the program.
|
||||
In the example program, `hw` is associated with `0x40007d`,
|
||||
which is the virtual memory address of the `Hello, world!` data.
|
||||
We can then use
|
||||
```
|
||||
--xy
|
||||
```
|
||||
to output that address, and
|
||||
```
|
||||
:-xy
|
||||
```
|
||||
to output it relative to the current address.
|
||||
So now instead of computing how far to jump, we can just jump to a label, e.g.
|
||||
```
|
||||
jm
|
||||
:-xy (use the relative address, because jumps are relative in x86-64)
|
||||
```
|
||||
And instead of figuring out the address of a piece of data, we can just use its label:
|
||||
```
|
||||
im
|
||||
--xy
|
||||
// rax now points to the data at the label "::xy"
|
||||
```
|
||||
|
||||
This also lets us compute the length of the hello world string automatically!
|
||||
By taking the address of the end of the string (`he`) and subtracting the
|
||||
start (`hw`), we get the length in bytes.
|
||||
So you can try adding more characters to the hello world message, and it'll just work.
|
||||
|
||||
All labels must be two ASCII characters. The address of each label is stored
|
||||
as a 32-bit number in the "label table". This is sort of like the command table—the
|
||||
index of the label `xy` is `128 * x + y`. Specifically, the entry for `xy` is at
|
||||
`0x420000 + 4 * (128 * x + y)`, since the label table starts at `0x420000`
|
||||
and each entry is 4 bytes.
|
||||
When we encounter `::xy`, we get the current position in the output file
|
||||
(using `lseek`), add the address of the start of the file (`0x400000`),
|
||||
and store that in the label table.
|
||||
When we encounter `:-xy` or `--xy`, we look up `xy` in the label table,
|
||||
and write the address (subtracting the current address for `:-`) to the output file.
|
||||
|
||||
## two passes?
|
||||
|
||||
This compiler actually needs to read through the source code,
|
||||
and output an executable, twice.
|
||||
This is because a label may be defined *after* it is used, e.g.:
|
||||
```
|
||||
jm
|
||||
:-aa jump forward
|
||||
...
|
||||
::aa this is where we're jumping to
|
||||
...
|
||||
```
|
||||
In the first pass, the `:-aa` will
|
||||
treat `aa` as having an address of 0. Then when
|
||||
we get to `::aa`, the address in the label table will be corrected.
|
||||
At the end of the first pass, we seek back to the start
|
||||
of the input and output files,
|
||||
and run the exact same code for the second pass.
|
||||
But this time, the correct address of `aa` is used, namely the
|
||||
one we calculated in the first pass.
|
||||
|
||||
|
||||
## other features
|
||||
|
||||
Now instead of writing out each of the 8 bytes making up a number,
|
||||
we can just write it in hexadecimal (e.g. `##3c.` for `3c 00 00 00 00 00 00 00`),
|
||||
and the compiler will automatically
|
||||
extend it to 8 bytes.
|
||||
we can just write it in hexadecimal, e.g. `##1c4.` for `c4 01 00 00 00 00 00 00`.
|
||||
This is especially nice because we don't need to write numbers backwards
|
||||
for little-endianness anymore!
|
||||
Numbers cannot appear at the end of a line (this was
|
||||
to make the compiler simpler to write), so I'm adding a `.` at the end of
|
||||
Numbers cannot appear at the end of a line (this made
|
||||
the compiler simpler to write), so I'm adding a `.` at the end of
|
||||
each one to avoid making that mistake.
|
||||
|
||||
Anything after a command is treated as a comment;
|
||||
additionally `//` can be used for comments on their own lines.
|
||||
I decided to implement them as simply as possible:
|
||||
I decided to implement this as simply as possible:
|
||||
I just added the command `//` to the command table, which outputs the byte `0x90`—this
|
||||
means "do nothing" (`nop`) in x86-64.
|
||||
Note that this means that the following code will not work as expected:
|
||||
means ["do nothing"](https://en.wikipedia.org/wiki/No-op)
|
||||
in x86-64.
|
||||
Note that the following code will not work as expected:
|
||||
```
|
||||
im
|
||||
// load the value 0x333 into rax
|
||||
##333.
|
||||
```
|
||||
since `0x90` gets inserted between the "load immediate" instruction code, and the immediate.
|
||||
since `0x90` gets inserted between the "load immediate" instruction code and the immediate.
|
||||
`\n\n` works identically, and lets us space out code a bit. But be careful:
|
||||
the number of blank lines must be a multiple of 3!
|
||||
|
||||
## limitations
|
||||
|
||||
Many of the limitations of our previous compilers apply to this one. Also,
|
||||
if you use a label without defining it, it uses address 0, rather than outputting
|
||||
an error message. This could be fixed: if the value in the label table is 0, and if we are
|
||||
an error message. This could be fixed: if the value in the label table is 0 and we are
|
||||
on the second pass, output an error message. This compiler was already tedious enough
|
||||
to implement, though!
|
||||
But thanks to labels, for future compilers at least we won't have to calculate
|
||||
|
|
24
02/in01
24
02/in01
|
@ -3,7 +3,7 @@
|
|||
;'i;'n;'0;'2;00 (0x40007d) input filename
|
||||
;'o;'u;'t;'0;'2;00 (0x400082) output filename
|
||||
;00;00;' ;'n;'o;'t;' ;'r;'e;'c;'o;'g;'n;'i;'z;'e;'d;\n;00;00;00;00;00;00 (0x400088) error message/where we read to
|
||||
;00 (0x4000a0) stores which pass we're on (1 for second pass)
|
||||
;00 (0x4000a0) stores which pass we're on (0 for first pass, 1 for second pass)
|
||||
;00;00;00;00;00;00;00
|
||||
;00;00;00;00;00;00;00;00 (0x4000a8) used for output
|
||||
unused padding
|
||||
|
@ -180,11 +180,11 @@ okay it's 0-9
|
|||
|
||||
;+B
|
||||
;BA
|
||||
okay we now have a digit in RBX
|
||||
okay we now have a digit in rbx
|
||||
;AR
|
||||
;<I;04
|
||||
;+B
|
||||
;RA store away in RBP
|
||||
;RA store away in rbp
|
||||
;jm;38;ff;ff;ff continue loop
|
||||
|
||||
unused padding
|
||||
|
@ -195,7 +195,7 @@ unused padding
|
|||
;00;00;00;00;00;00;00;00;00;00;00;00;00;00;00;00
|
||||
;00;00;00;00;00;00;00;00;00;00;00;00;00;00;00;00
|
||||
|
||||
okay we have a full number in RBP, time to write it to the file
|
||||
okay we have a full number in rbp, time to write it to the file.
|
||||
start by putting it at address 0x4000a8
|
||||
;im;a8;00;40;00;00;00;00;00
|
||||
;BA
|
||||
|
@ -210,7 +210,7 @@ now write
|
|||
;IA
|
||||
;im;08;00;00;00;00;00;00;00 write 8 bytes
|
||||
;DA
|
||||
;im;01;00;00;00;00;00;00;00 write
|
||||
;im;01;00;00;00;00;00;00;00 write
|
||||
;sy
|
||||
|
||||
;jm;c3;03;00;00 skip to newline
|
||||
|
@ -327,11 +327,11 @@ subtract current address
|
|||
;nA;+B
|
||||
;RA store relative address in rbp
|
||||
|
||||
now we want to write eax to the output file.
|
||||
now we want to write ebp to the output file.
|
||||
start by putting it at address 0x4000a8
|
||||
;im;a8;00;40;00;00;00;00;00
|
||||
;BA
|
||||
;AR put relative address in rax
|
||||
;AR
|
||||
;sd
|
||||
|
||||
now write
|
||||
|
@ -341,7 +341,7 @@ now write
|
|||
;IA
|
||||
;im;04;00;00;00;00;00;00;00 4 bytes
|
||||
;DA
|
||||
;im;01;00;00;00;00;00;00;00 write
|
||||
;im;01;00;00;00;00;00;00;00 write
|
||||
;sy
|
||||
|
||||
;jm;66;01;00;00 skip to newline
|
||||
|
@ -368,7 +368,7 @@ it's not a label or a number. let's look it up in the instruction table.
|
|||
;BA
|
||||
;RA store away address of command text in rbp
|
||||
;zA;lb
|
||||
;DA number of bytes to write (used for syscall if no error)
|
||||
;DA number of bytes to write (used for syscall if command exists)
|
||||
;BA
|
||||
;zA
|
||||
;cm;jn;54;00;00;00 check if # of bytes is 0, if not, skip outputting error
|
||||
|
@ -392,7 +392,7 @@ this is a real command
|
|||
;im;01;00;00;00;00;00;00;00 add 1 because we don't want to write the length
|
||||
;+B
|
||||
;IA address of data to write
|
||||
;im;04;00;00;00;00;00;00;00 out file descriptor
|
||||
;im;04;00;00;00;00;00;00;00 out file descriptor
|
||||
;JA
|
||||
;im;01;00;00;00;00;00;00;00 write
|
||||
;sy
|
||||
|
@ -1777,7 +1777,7 @@ the formatting changed appropriately.
|
|||
;00;00;00;00;00;00;00;00
|
||||
;00;00;00;00;00;00;00;00
|
||||
;00;00;00;00;00;00;00;00
|
||||
;00;00;00;00;00;00;00;00
|
||||
;01;90;00;00;00;00;00;00 \n\n
|
||||
;00;00;00;00;00;00;00;00
|
||||
;00;00;00;00;00;00;00;00
|
||||
;00;00;00;00;00;00;00;00
|
||||
|
@ -6550,7 +6550,7 @@ the formatting changed appropriately.
|
|||
;00;00;00;00;00;00;00;00
|
||||
;00;00;00;00;00;00;00;00
|
||||
;00;00;00;00;00;00;00;00
|
||||
;01;90;00;00;00;00;00;00
|
||||
;01;90;00;00;00;00;00;00 // comments
|
||||
;00;00;00;00;00;00;00;00
|
||||
;00;00;00;00;00;00;00;00
|
||||
;00;00;00;00;00;00;00;00
|
||||
|
|
11
02/in02
11
02/in02
|
@ -1,6 +1,6 @@
|
|||
jm
|
||||
:-co jump to code
|
||||
::hw
|
||||
::hw start of hello world
|
||||
'H
|
||||
'e
|
||||
'l
|
||||
|
@ -16,11 +16,12 @@ jm
|
|||
'!
|
||||
\n
|
||||
::he end of hello world
|
||||
|
||||
|
||||
|
||||
::co start of code
|
||||
//
|
||||
// now we'll calculate the length of the hello world string
|
||||
// calculate the length of the hello world string
|
||||
// by subtracting hw from he.
|
||||
//
|
||||
im
|
||||
--he
|
||||
BA
|
||||
|
@ -29,7 +30,7 @@ im
|
|||
nA
|
||||
+B
|
||||
DA put length in rdx
|
||||
// okay now we can write it
|
||||
// okay now write it
|
||||
im
|
||||
##1.
|
||||
JA set rdi to 1 (stdout)
|
||||
|
|
6
Makefile
6
Makefile
|
@ -2,6 +2,12 @@ all: markdown README.html
|
|||
$(MAKE) -C 00
|
||||
$(MAKE) -C 01
|
||||
$(MAKE) -C 02
|
||||
clean:
|
||||
$(MAKE) -C 00 clean
|
||||
$(MAKE) -C 01 clean
|
||||
$(MAKE) -C 02 clean
|
||||
rm -f markdown
|
||||
rm -f README.html
|
||||
markdown: markdown.c
|
||||
$(CC) -O2 -o markdown -Wall -Wconversion -Wshadow -std=c89 markdown.c
|
||||
README.html: markdown README.md
|
||||
|
|
27
README.md
27
README.md
|
@ -17,7 +17,14 @@ Note that the executables produced in this series will only run on
|
|||
64-bit Linux, because each OS/architecture combination would need its own separate
|
||||
executable.
|
||||
|
||||
The README for the first stage is [here](00/README.md).
|
||||
## table of contents
|
||||
|
||||
- [stage 00](00/README.md) - a program converting a text file with
|
||||
hexadecimal digit pairs to a binary file.
|
||||
- [stage 01](01/README.md) - a language with comments, and 2-character
|
||||
command codes.
|
||||
- [stage 02](02/README.md) - a language with labels
|
||||
- more coming soon (hopefully)
|
||||
|
||||
## prerequisite knowledge
|
||||
|
||||
|
@ -44,8 +51,7 @@ decimal.
|
|||
- ASCII, null-terminated strings
|
||||
- how pointers work
|
||||
- how floating-point numbers work
|
||||
- maybe some basic Intel-style x86-64 assembly (you can probably pick it up on
|
||||
the way though)
|
||||
- some basic Intel-style x86-64 assembly
|
||||
|
||||
It will help you a lot to know how to program (with any programming language),
|
||||
but it's not strictly necessary.
|
||||
|
@ -53,12 +59,11 @@ but it's not strictly necessary.
|
|||
## instruction set
|
||||
|
||||
x86-64 has a *gigantic* instruction set. The manual for it is over 2,000 pages
|
||||
long! So, it makes sense to select only a small subset of it to use for all the
|
||||
stages of our compiler. The set I've chosen can be found in `instructions.txt`.
|
||||
long! So it makes sense to select only a small subset of it to use.
|
||||
The set I've chosen can be found in `instructions.txt`.
|
||||
I think it achieves a pretty good balance between having few enough
|
||||
instructions to be manageable and having enough instructions to be useable.
|
||||
To be clear, you don't need to read that file to understand the series, at least
|
||||
not right away.
|
||||
To be clear, you don't need to read that file to understand the series.
|
||||
|
||||
## principles
|
||||
|
||||
|
@ -91,15 +96,15 @@ project can't necessarily even do that though, because the Linux kernel, which
|
|||
we depend on, is compiled from C, so we can't fully trust *it*. To *truly*
|
||||
create a fully trustable compiler, you'd need to manually write to a USB with a
|
||||
circuit, create an operating system from nothing (without even a text editor),
|
||||
and then follow this series, or maybe you don't even trust your CPU vendor...
|
||||
I'll leave that to someone else
|
||||
and then follow this series, or maybe you don't even trust your CPU...
|
||||
I'll leave that to someone else.
|
||||
|
||||
## license
|
||||
|
||||
```
|
||||
This project is in the public domain. Any copyright protections from any law
|
||||
for this project are forfeited by the author(s). No warranty is provided for
|
||||
this project, and the author(s) shall not be held liable in connection with it.
|
||||
are forfeited by the author(s). No warranty is provided, and the author(s)
|
||||
shall not be held liable in connection with it.
|
||||
```
|
||||
|
||||
## contributing
|
||||
|
|
|
@ -101,3 +101,4 @@ syscall
|
|||
>0f 05
|
||||
nop
|
||||
>90
|
||||
(more will be added as needed)
|
||||
|
|
|
@ -58,7 +58,8 @@ static void output_md_text(FILE *out, int *flags, int line_number, const char *t
|
|||
case '[': {
|
||||
/* link */
|
||||
char url2[256] = {0};
|
||||
const char *label, *url, *label_end, *url_end, *dot;
|
||||
const char *label, *url, *label_end, *url_end;
|
||||
char *dot;
|
||||
int n_label, n_url;
|
||||
|
||||
label = p+1;
|
||||
|
@ -88,7 +89,7 @@ static void output_md_text(FILE *out, int *flags, int line_number, const char *t
|
|||
/* replace links to md files with links to html files */
|
||||
strcpy(dot, ".html");
|
||||
}
|
||||
fprintf(out, "<a href=\"%s\" target=\"_blank\">%.*s</a>",
|
||||
fprintf(out, "<a href=\"%s\">%.*s</a>",
|
||||
url2, n_label, label);
|
||||
p = url_end;
|
||||
} break;
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue