readme tweaks, mainly

2021-11-10 12:55:41 -05:00 · 2021-11-10 12:55:41 -05:00 · 2288e47516
commit 2288e47516
parent 3255cd32d7
13 changed files with 177 additions and 84 deletions
--- a/00/Makefile
+++ b/00/Makefile
@ -3,3 +3,5 @@ out00: in00
 	./hexcompile
 %.html: %.md ../markdown
 	../markdown $<
+clean:
+	rm -f out00 README.html
--- a/00/README.md
+++ b/00/README.md
@ -102,7 +102,7 @@ execute-enabled. Normally people don't do this, for security, but we won't worry
 about that (don't compile any untrusted code with any compiler from this series!)
 Without further ado, here's the contents of the program header:

- `01 00 00 00` Segment type 1 (this should be loaded into memory)
+- `01 00 00 00` Segment type 1 (this segment should be loaded into memory)
 - `07 00 00 00` Flags = RWE (readable, writeable, and executable)
 - `78 00 00 00 00 00 00 00` Offset in file = 120 bytes
 - `78 00 40 00 00 00 00 00` Virtual address = 0x400078
@ -114,7 +114,7 @@ memory address that the segment will be loaded to.
 Nowadays, computers use virtual memory, meaning that
 addresses in our program don't actually correspond to where the memory is
 physically stored in RAM (the CPU translates between virtual and physical
-memory addresses). There are many reasons for this: making sure each process has
+addresses). There are many reasons for this: making sure each process has
 its own memory space, memory protection, etc. You can read more about it
 elsewhere.

@ -130,7 +130,7 @@ each page (block) of memory is 4096 bytes long, and has to start at an address
 that is a multiple of 4096. Our program needs to be loaded into a memory page,
 so its *virtual address* needs to be a multiple of 4096. We're using `0x400000`.
 But wait! Didn't we use `0x400078` for the virtual address? Well, yes but that's
-because the *data in the file* is loaded to address `0x400078`. The actual page
+because the segment's data is loaded to address `0x400078`. The actual page
 of memory that the OS will allocate for our segment will start at `0x400000`. The
 reason we need to start `0x78` bytes in is that Linux expects the data in the
 file to be at the same position in the page as when it will be loaded, and it
@ -156,7 +156,8 @@ These instructions execute syscall `2` with arguments `0x40026d`, `0`.
 If you're familiar with C code, this is `open("in00", O_RDONLY)`.
 A syscall is the mechanism which lets software ask the kernel to do things.
 [Here](https://filippo.io/linux-syscall-table/) is a nice table of syscalls you
-can look through if you're interested. You can also install `strace` (e.g. with
+can look through if you're interested. You can also install
+[strace](https://strace.io) (e.g. with
 `sudo apt install strace`) and run `strace ./hexcompile` to see all the syscalls
 our program does.
 Syscall #2, on 64-bit Linux, is `open`. It's used to open a file. You can read
@ -175,13 +176,13 @@ descriptor Linux gave us. This is because Linux assigns file descriptor numbers
 sequentially, starting from
 [0 for stdin, 1 for stdout, 2 for stderr](https://en.wikipedia.org/wiki/Standard_streams),
 and then 3, 4, 5, ... for any files our program opens. So
-this file, the first one our program opens, will have descriptor `3`.
+this file, the first one our program opens, will have descriptor 3.

 Now we open our output file:

 - `48 b8 72 02 40 00 00 00 00 00` `mov rax, 0x400272`
 - `48 89 c7` `mov rdi, rax`
- `48 b8 41 02 00 00 00 00 00 00` `mov rax, 0x41`
+- `48 b8 41 02 00 00 00 00 00 00` `mov rax, 0x241`
 - `48 89 c6` `mov rsi, rax`
 - `48 b8 ed 01 00 00 00 00 00 00` `mov rax, 0o755`
 - `48 89 c2` `mov rdx, rax`
@ -193,11 +194,12 @@ similar to our first call, with two important differences: first, we specify
 `0x241` as the second argument. This tells Linux that we are writing to the
 file (`O_WRONLY = 0x01`), that we want to create it if it doesn't exist
 (`O_CREAT = 0x40`), and that we want to delete any previous contents it had
-(`O_TRUNC = 0x200`). Secondly, we are setting the third argument this time.  It
+(`O_TRUNC = 0x200`). Secondly, we're setting the third argument this time.  It
 specifies the permissions our file is created with (`0o755` means user
 read/write/execute, group/other read/execute). This is not very important to
 the actual execution of the program, so don't worry if you don't know 
 about UNIX permissions.
+Note that the output file's descriptor will be 4.

 Now we can start reading from the file. We're going to loop back to this part of
 the code every time we want to read a new hexadecimal number from the input
@ -223,13 +225,13 @@ We're telling Linux to output to `0x40026a`, which is just a part of this
 segment (see further down). Normally you would read to a different segment of
 the program from where the code is, but we want this to be as simple as
 possible.
-The number of bytes *actually read*, taking into account that we might have
+The number of bytes *actually* read, taking into account that we might have
 reached the end of the file, is stored in `rax`.

 - `48 89 c3` `mov rbx, rax`
 - `48 b8 03 00 00 00 00 00 00 00` `mov rax, 3`
 - `48 39 d8` `cmp rax, rbx`
- `0f 8f 50 01 00 00` `jg 0x400250`
+- `0f 8f 50 01 00 00` `jg +0x150 (0x400250)`

 This tells the CPU to jump to a later part of the code (address `0x400250`) if 3
 is greater than the number of bytes we got, in other words, if we reached the
@ -307,7 +309,7 @@ Okay, now `rax` contains the byte specified by the two hex digits we read.
 - `48 93` `xchg rax, rbx`
 - `88 03` `mov byte [rbx], al`

-Write the byte to a specific memory location (address `0x40026c`).
+Put the byte in a specific memory location (address `0x40026c`).

 - `48 b8 04 00 00 00 00 00 00 00` `mov rax, 4`
 - `48 89 c7` `mov rdi, rax`
@ -356,7 +358,7 @@ This is where we conditionally jumped to way back when we determined if we
 reached the end of the file. This calls syscall #60, `exit`, with one argument,
 0 (exit code 0, indicating we exited successfully).

-Normally, you should close files descriptors (with syscall #3), to tell Linux you're
+Normally, you would close files descriptors (with syscall #3), to tell Linux you're
 done with them, but we don't need to. It'll automatically close all our open
 file descriptors when our program exits.

@ -387,4 +389,4 @@ a while.
 But these problems aren't really a big deal. We'll only be running this on
 little programs and we'll be sure to check that our input is in the right
 format. And with that, we are ready to move on to the
-[next stage...](../01/README.md).
+[next stage...](../01/README.md)
--- a/01/Makefile
+++ b/01/Makefile
@ -5,3 +5,5 @@ out00: in00
 	../00/hexcompile
 %.html: %.md ../markdown
 	../markdown $<
+clean:
+	rm -f out00 out01 README.html
--- a/01/README.md
+++ b/01/README.md
@ -8,7 +8,7 @@ is the executable for this stage's compiler. Run it (it'll read from the file
 `Hello, world!` when run. Let's take a look at the input we're providing to the
 stage 01 compiler, `in01`:

-<pre><code>
+```
 || ELF Header
 ;im;01;00;00;00;00;00;00;00 file descriptor for stdout
 ;JA
@ -24,9 +24,9 @@ stage 01 compiler, `in01`:
 ;sy
 ;'H;'e;'l;'l;'o;',;' ;'w;'o;'r;'l;'d;'!;\n the string we're printing
 ;
-</code></pre>
+```

-Look at that! There are comments! Much nicer than just hexadecimal digit pairs.
+Look at that! There are even comments! Much nicer than just hexadecimal digit pairs.

 ## end result

@ -50,9 +50,9 @@ actually print out an error message and exit, rather than continuing as if
 nothing happened! Try adding `xx;` to the end of the file `in01`, and running
 `./out00`. You should get the error message:

-<pre><code>
+```
 xx not recognized.
-</code></pre>
+```

 Pretty cool, huh?
 Anyways let's see how this compiler actually works.
@ -63,7 +63,7 @@ Writing in our stage 00 language is much nicer than editing an
 executable, because it's easier to move things around, and also, we can separate
 our program into lines! Let's take a look at the start:

-<pre><code>
+```
 7f 45 4c 46
 02
 01
@ -90,7 +90,7 @@ a8 00 40 00 00 00 00 00
 00 10 02 00 00 00 00 00
 00 10 02 00 00 00 00 00
 00 10 00 00 00 00 00 00
-</code></pre>
+```

 This is the ELF header and program header. It's just like our last one, but with
 a couple of differences. First, our entry point is at offset 0xa8 instead of 0x78.
@ -113,7 +113,7 @@ recognized."`
 - `00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00` (unused)

 Here's the data for our program. As you can see from my annotations, we have the
-input and output file, as well as the error message. The command part of the
+input and output file names, as well as the error message. The command part of the
 error message is left blank for now (we'll fill it in when the code is actually
 run).

@ -182,8 +182,8 @@ program with exit code 0 (successful).
 - `48 01 d8` `add rax, rbx`

 This here looks at the two bytes we read in (we'll call them `b1` and `b2`) and
-computes `b1 * 128 + b2` (more specifically `(b1 << 7) + b2`). This is the index
-in our command table corresponding to the two characters from the input file.
+computes `b1 * 128 + b2` (more specifically `(b1 << 7) + b2`). This is the corresponding index
+in our command table.

 - `48 c1 e0 03` `shl rax, 3`
 - `48 89 c3` `mov rbx, rax`
@ -211,7 +211,7 @@ is `03 48 89 c3`. We set the length to 0 for unused entries.
 So this code checks if the entry for this command starts with a zero byte. If it
 does, that means the two characters we read in don't actually correspond to a
 real command. If that's the case, this next bit of code is executed (otherwise
-it's skiped over):
+it's skipped over):

 - `48 b8 02 00 00 00 00 00 00 00` `mov rax, 2 (stderr)`
 - `48 89 c7` `mov rdi, rax`
@ -228,7 +228,7 @@ it's skiped over):
 - `00 00 00 00 00 00 00 00 00 00 00 00 00 00` (unused)

 This prints our error message, now filled in with the specific unrecognized
-instruction, to standard error, and exits with code 1, to indicate failure.
+instruction, to standard error, then exits with code 1, to indicate failure.

 - `48 89 eb` `mov rbx, rax`
 - `31 c0` `mov rax, 0`
@ -273,7 +273,7 @@ all the way back to read the next command. Otherwise, we keep looping. This
 skips over any comments/whitespace we might have between a command and the
 following command.

-And that's all the *code* for this compiler. Next comes some data.
+And that's all the *code* for this compiler. Next comes the command table.

 First, there's a whole bunch of unused 0s. Then there's the line

@ -293,7 +293,7 @@ Which is the encoding of the `syscall` instruction.
 You can look through the rest of the table, if you want. But let's look at the
 very end:

-<code><pre>
+```
 78
 7f 45 4c 46
 02
@ -321,7 +321,7 @@ very end:
 00 00 08 00 00 00 00 00
 00 00 08 00 00 00 00 00
 00 10 00 00 00 00 00 00
-</code></pre>
+```

 This is at the position for `||`, and it contains an ELF header. One thing you
 might notice is that we decided that each entry is 8 bytes long, but this one is
@ -340,5 +340,5 @@ fixed this, but frankly I've had enough of writing code in hexadecimal. So let's
 move on to [stage 02](../02/README.md),
 now that we have a nicer language on our hands. From now
 on, since we have comments, I'm gonna do most of the explaining in the source file
-itself, rather than the README. But there'll still be a bit of stuff there each
+itself, rather than the README. But there'll still be some stuff there each
 time.
--- a/01/commands.txt
+++ b/01/commands.txt
@ -7,11 +7,12 @@ ff - Byte ff
 'a - Character a (byte 0x61)
 '! - Character ! (byte 0x21)
 etc.
+\n - Newline (byte 0x0a)

 zA - Zero rax
 im - Set rax to an immediate value, e.g.
-     im;05;00;00;00;00;00;00;00;
-	 will set rax to 5.
+        im;05;00;00;00;00;00;00;00;
+     will set rax to 5.

 ax	bx	cx	dx	sp	bp	si	di
 A	B	C	D	S	R	I	J
--- a/02/Makefile
+++ b/02/Makefile
@ -1,7 +1,9 @@
 all: out01 out02 README.html
 out01: in01
 	../01/out00
-out02: out01
+out02: out01 in02
 	./out01
 %.html: %.md ../markdown
 	../markdown $<
+clean:
+	rm -f out01 out02 README.html
--- a/02/README.md
+++ b/02/README.md
@ -1,13 +1,15 @@
 # stage 02

 The compiler for this stage is in the file `in01`, an input for our previous compiler.
-The specifics of how this compiler works are in the comments in that file, but here I'll
+So if you run `../01/out00`, you'll get the file `out01`, which is
+this stage's compiler.
+The specifics of how this compiler works are in the comments in `in01`, but here I'll
 give an overview.
 Let's take a look at `in02`, an example input file for this compiler:
 ```
 jm
 :-co   jump to code
-::hw
+::hw  start of hello world
 'H
 'e
 'l
@ -23,11 +25,12 @@ jm
 '!
 \n
 ::he  end of hello world
+
+
+
 ::co  start of code
-//
-// now we'll calculate the length of the hello world string
+// calculate the length of the hello world string
 // by subtracting hw from he.
-//
 im
 --he
 BA
@ -36,7 +39,7 @@ im
 nA
 +B
 DA   put length in rdx
-// okay now we can write it
+// okay now write it
 im
 ##1.
 JA    set rdi to 1 (stdout)
@ -54,56 +57,123 @@ im
 sy
 ```

-You can try adding more characters to the hello world message, and it'll just work;
-the length of the text is computed automatically!
+We can compile it by running `./out01`. This will produce
+the executable `out02`, which you can run. It prints
+`Hello, world!`.

-This time, commands are separated by newlines instead of semicolons.
-Each line begins with a 2-character command identifier. There are some special identifiers though:
+In this language,
+commands are separated by newlines instead of semicolons.
+Each line begins with a 2-character command.
+All of the commands from the previous compiler are here,
+plus six new ones:

 - `::` marks a *label*
 - `--` outputs a label's (absolute) address
 - `:-` outputs a label's relative address
 - `##` outputs a number
-
-All other commands work like they did in the previous compiler—if you scroll down in the
-`in01` source file, you'll see the full command table.
+- `//` is for comments
+- `\n\n` does nothing (used for spacing)

 ## labels

 Labels are the most important new feature of this language.
+A line like
+```
+::xy
+```
+associates the name `xy` with the address of the next byte of the program.
+In the example program, `hw` is associated with `0x40007d`, 
+which is the virtual memory address of the `Hello, world!` data.
+We can then use
+```
+--xy
+```
+to output that address, and
+```
+:-xy
+```
+to output it relative to the current address.
+So now instead of computing how far to jump, we can just jump to a label, e.g.
+```
+jm
+:-xy  (use the relative address, because jumps are relative in x86-64)
+```
+And instead of figuring out the address of a piece of data, we can just use its label:
+```
+im
+--xy
+// rax now points to the data at the label "::xy"
+```
+
+This also lets us compute the length of the hello world string automatically!
+By taking the address of the end of the string (`he`) and subtracting the
+start (`hw`), we get the length in bytes.
+So you can try adding more characters to the hello world message, and it'll just work.
+
+All labels must be two ASCII characters. The address of each label is stored
+as a 32-bit number in the "label table". This is sort of like the command table—the
+index of the label `xy` is `128 * x + y`. Specifically, the entry for `xy` is at
+`0x420000 + 4 * (128 * x + y)`, since the label table starts at `0x420000`
+and each entry is 4 bytes.
+When we encounter `::xy`, we get the current position in the output file
+(using `lseek`), add the address of the start of the file (`0x400000`), 
+and store that in the label table.
+When we encounter `:-xy` or `--xy`, we look up `xy` in the label table,
+and write the address (subtracting the current address for `:-`) to the output file.

 ## two passes?

+This compiler actually needs to read through the source code,
+and output an executable, twice.
+This is because a label may be defined *after* it is used, e.g.:
+```
+jm
+:-aa   jump forward
+...
+::aa   this is where we're jumping to
+...
+```
+In the first pass, the `:-aa` will
+treat `aa` as having an address of 0. Then when
+we get to `::aa`, the address in the label table will be corrected.
+At the end of the first pass, we seek back to the start 
+of the input and output files,
+and run the exact same code for the second pass.
+But this time, the correct address of `aa` is used, namely the
+one we calculated in the first pass.
+
+
 ## other features

 Now instead of writing out each of the 8 bytes making up a number,
-we can just write it in hexadecimal (e.g. `##3c.` for `3c 00 00 00 00 00 00 00`),
-and the compiler will automatically
-extend it to 8 bytes.
+we can just write it in hexadecimal, e.g. `##1c4.` for `c4 01 00 00 00 00 00 00`.
 This is especially nice because we don't need to write numbers backwards
 for little-endianness anymore!
-Numbers cannot appear at the end of a line (this was
-to make the compiler simpler to write), so I'm adding a `.` at the end of
+Numbers cannot appear at the end of a line (this made
+the compiler simpler to write), so I'm adding a `.` at the end of
 each one to avoid making that mistake.

 Anything after a command is treated as a comment;
 additionally `//` can be used for comments on their own lines.
-I decided to implement them as simply as possible:
+I decided to implement this as simply as possible:
 I just added the command `//` to the command table, which outputs the byte `0x90`—this
-means "do nothing" (`nop`) in x86-64.
-Note that this means that the following code will not work as expected:
+means ["do nothing"](https://en.wikipedia.org/wiki/No-op)
+in x86-64.
+Note that the following code will not work as expected:
 ```
 im
 // load the value 0x333 into rax
 ##333.
 ```
-since `0x90` gets inserted between the "load immediate" instruction code, and the immediate.
+since `0x90` gets inserted between the "load immediate" instruction code and the immediate.
+`\n\n` works identically, and lets us space out code a bit. But be careful:
+the number of blank lines must be a multiple of 3!

 ## limitations

 Many of the limitations of our previous compilers apply to this one. Also,
 if you use a label without defining it, it uses address 0, rather than outputting
-an error message. This could be fixed: if the value in the label table is 0, and if we are
+an error message. This could be fixed: if the value in the label table is 0 and we are
 on the second pass, output an error message. This compiler was already tedious enough
 to implement, though! 
 But thanks to labels, for future compilers at least we won't have to calculate
--- a/02/in01
+++ b/02/in01
@ -3,7 +3,7 @@
 ;'i;'n;'0;'2;00   (0x40007d) input filename
 ;'o;'u;'t;'0;'2;00  (0x400082) output filename
 ;00;00;' ;'n;'o;'t;' ;'r;'e;'c;'o;'g;'n;'i;'z;'e;'d;\n;00;00;00;00;00;00 (0x400088) error message/where we read to
-;00 (0x4000a0) stores which pass we're on (1 for second pass)
+;00 (0x4000a0) stores which pass we're on (0 for first pass, 1 for second pass)
 ;00;00;00;00;00;00;00 
 ;00;00;00;00;00;00;00;00 (0x4000a8) used for output
 unused padding
@ -180,11 +180,11 @@ okay it's 0-9

 ;+B
 ;BA
-okay we now have a digit in RBX
+okay we now have a digit in rbx
 ;AR
 ;<I;04
 ;+B
-;RA    store away in RBP
+;RA    store away in rbp
 ;jm;38;ff;ff;ff  continue loop

 unused padding
@ -195,7 +195,7 @@ unused padding
 ;00;00;00;00;00;00;00;00;00;00;00;00;00;00;00;00
 ;00;00;00;00;00;00;00;00;00;00;00;00;00;00;00;00

-okay we have a full number in RBP, time to write it to the file
+okay we have a full number in rbp, time to write it to the file.
 start by putting it at address 0x4000a8
 ;im;a8;00;40;00;00;00;00;00
 ;BA
@ -210,7 +210,7 @@ now write
 ;IA
 ;im;08;00;00;00;00;00;00;00 write 8 bytes
 ;DA
-;im;01;00;00;00;00;00;00;00  write
+;im;01;00;00;00;00;00;00;00 write
 ;sy

 ;jm;c3;03;00;00 skip to newline
@ -327,11 +327,11 @@ subtract current address
 ;nA;+B 
 ;RA   store relative address in rbp

-now we want to write eax to the output file.
+now we want to write ebp to the output file.
 start by putting it at address 0x4000a8
 ;im;a8;00;40;00;00;00;00;00
 ;BA
-;AR  put relative address in rax
+;AR
 ;sd

 now write
@ -341,7 +341,7 @@ now write
 ;IA
 ;im;04;00;00;00;00;00;00;00 4 bytes
 ;DA
-;im;01;00;00;00;00;00;00;00  write
+;im;01;00;00;00;00;00;00;00 write
 ;sy

 ;jm;66;01;00;00 skip to newline
@ -368,7 +368,7 @@ it's not a label or a number. let's look it up in the instruction table.
 ;BA
 ;RA    store away address of command text in rbp
 ;zA;lb
-;DA    number of bytes to write (used for syscall if no error)
+;DA    number of bytes to write (used for syscall if command exists)
 ;BA
 ;zA
 ;cm;jn;54;00;00;00  check if # of bytes is 0, if not, skip outputting error
@ -392,7 +392,7 @@ this is a real command
 ;im;01;00;00;00;00;00;00;00 add 1 because we don't want to write the length 
 ;+B
 ;IA   address of data to write
-;im;04;00;00;00;00;00;00;00  out file descriptor
+;im;04;00;00;00;00;00;00;00 out file descriptor
 ;JA
 ;im;01;00;00;00;00;00;00;00 write
 ;sy
@ -1777,7 +1777,7 @@ the formatting changed appropriately.
 ;00;00;00;00;00;00;00;00
 ;00;00;00;00;00;00;00;00
 ;00;00;00;00;00;00;00;00
-;00;00;00;00;00;00;00;00
+;01;90;00;00;00;00;00;00   \n\n
 ;00;00;00;00;00;00;00;00
 ;00;00;00;00;00;00;00;00
 ;00;00;00;00;00;00;00;00
@ -6550,7 +6550,7 @@ the formatting changed appropriately.
 ;00;00;00;00;00;00;00;00
 ;00;00;00;00;00;00;00;00
 ;00;00;00;00;00;00;00;00
-;01;90;00;00;00;00;00;00
+;01;90;00;00;00;00;00;00  // comments
 ;00;00;00;00;00;00;00;00
 ;00;00;00;00;00;00;00;00
 ;00;00;00;00;00;00;00;00
--- a/02/in02
+++ b/02/in02
@ -1,6 +1,6 @@
 jm
 :-co   jump to code
-::hw
+::hw  start of hello world
 'H
 'e
 'l
@ -16,11 +16,12 @@ jm
 '!
 \n
 ::he  end of hello world
+
+
+
 ::co  start of code
-//
-// now we'll calculate the length of the hello world string
+// calculate the length of the hello world string
 // by subtracting hw from he.
-//
 im
 --he
 BA
@ -29,7 +30,7 @@ im
 nA
 +B
 DA   put length in rdx
-// okay now we can write it
+// okay now write it
 im
 ##1.
 JA    set rdi to 1 (stdout)
--- a/6
+++ b/6
@ -2,6 +2,12 @@ all: markdown README.html
 	$(MAKE) -C 00
 	$(MAKE) -C 01
 	$(MAKE) -C 02
+clean:
+	$(MAKE) -C 00 clean
+	$(MAKE) -C 01 clean
+	$(MAKE) -C 02 clean
+	rm -f markdown
+	rm -f README.html
 markdown: markdown.c
 	$(CC) -O2 -o markdown -Wall -Wconversion -Wshadow -std=c89 markdown.c
 README.html: markdown README.md
--- a/README.md
+++ b/README.md
@ -17,7 +17,14 @@ Note that the executables produced in this series will only run on
 64-bit Linux, because each OS/architecture combination would need its own separate
 executable.

-The README for the first stage is [here](00/README.md).
+## table of contents
+
+- [stage 00](00/README.md) - a program converting a text file with 
+hexadecimal digit pairs to a binary file.
+- [stage 01](01/README.md) - a language with comments, and 2-character
+command codes.
+- [stage 02](02/README.md) - a language with labels
+- more coming soon (hopefully)

 ## prerequisite knowledge

@ -44,8 +51,7 @@ decimal.
 - ASCII, null-terminated strings
 - how pointers work
 - how floating-point numbers work
- maybe some basic Intel-style x86-64 assembly (you can probably pick it up on
-the way though)
+- some basic Intel-style x86-64 assembly

 It will help you a lot to know how to program (with any programming language),
 but it's not strictly necessary.
@ -53,12 +59,11 @@ but it's not strictly necessary.
 ## instruction set

 x86-64 has a *gigantic* instruction set. The manual for it is over 2,000 pages
-long! So, it makes sense to select only a small subset of it to use for all the
-stages of our compiler. The set I've chosen can be found in `instructions.txt`.
+long! So it makes sense to select only a small subset of it to use.
+The set I've chosen can be found in `instructions.txt`.
 I think it achieves a pretty good balance between having few enough
 instructions to be manageable and having enough instructions to be useable.
-To be clear, you don't need to read that file to understand the series, at least
-not right away.
+To be clear, you don't need to read that file to understand the series.

 ## principles

@ -91,15 +96,15 @@ project can't necessarily even do that though, because the Linux kernel, which
 we depend on, is compiled from C, so we can't fully trust *it*. To *truly*
 create a fully trustable compiler, you'd need to manually write to a USB with a
 circuit, create an operating system from nothing (without even a text editor),
-and then follow this series, or maybe you don't even trust your CPU vendor...
-I'll leave that to someone else
+and then follow this series, or maybe you don't even trust your CPU...
+I'll leave that to someone else.

 ## license

 ```
 This project is in the public domain. Any copyright protections from any law
-for this project are forfeited by the author(s). No warranty is provided for
-this project, and the author(s) shall not be held liable in connection with it.
+are forfeited by the author(s). No warranty is provided, and the author(s)
+shall not be held liable in connection with it.
 ```

 ## contributing
--- a/instructions.txt
+++ b/instructions.txt
@ -101,3 +101,4 @@ syscall
 >0f 05
 nop
 >90
+(more will be added as needed)
--- a/markdown.c
+++ b/markdown.c
@ -58,7 +58,8 @@ static void output_md_text(FILE *out, int *flags, int line_number, const char *t
 		case '[': {
 			/* link */
 			char url2[256] = {0};
-			const char *label, *url, *label_end, *url_end, *dot;
+			const char *label, *url, *label_end, *url_end;
+			char *dot;
 			int n_label, n_url;

 			label = p+1;
@ -88,7 +89,7 @@ static void output_md_text(FILE *out, int *flags, int line_number, const char *t
 				/* replace links to md files with links to html files */
 				strcpy(dot, ".html");
 			}
-			fprintf(out, "<a href=\"%s\" target=\"_blank\">%.*s</a>",
+			fprintf(out, "<a href=\"%s\">%.*s</a>",
 				url2, n_label, label);
 			p = url_end;
 		} break;