lang-bootstrap/README.md

# boostrapping a (Linux x86-64) C compiler

Compilers nowadays are written in languages like C, which themselves need to be
compiled. But then, you need a C compiler to compile your C compiler! Of course,
the very first C compiler was not written in C.
First, people made assemblers, then simple programming languages,
then, eventually, it was possible to make a C compiler.
In this repository, we'll explore how that's done. Each directory here
is a "stage" in the process. The first one, `00`, is a hand-written
executable, and the last one, `05`, is a C compiler. Each directory has its own
README explaining what's going on.

You can run `bootstrap.sh` to run through and test every stage.
To get HTML versions of all README pages, run `make`.

Note that the executables produced in this series will only run on 
64-bit Linux, because each OS/architecture combination would need its own separate
executable.

## table of contents

- [stage 00](00/README.md) - a program converting a text file with 
hexadecimal digit pairs to a binary file.
- [stage 01](01/README.md) - a language with comments, and 2-character
command codes.
- [stage 02](02/README.md) - a language with labels
- [stage 03](03/README.md) - a language with longer labels, better error messages, and less register manipulation
- [stage 04](04/README.md) - a language with nice functions and local variables
- [stage 04a](04a/README.md) - (interlude) a simple preprocessor
- [stage 05](05/README.md) - a C compiler capable of compiling TCC
- [stage 06](06/README.md) - an interpreter capable of executing zig

## prerequisite knowledge

If you want to follow along with this series, you'll probably want to know about:

- number bases -- if a number is preceded by 0x, 0o, or 0b in this series, that
means hexadecimal/octal/binary respectively. So 0xff = FF hexadecimal = 255
decimal.
- bits, bytes, kilobytes, etc.
- bitwise operations (not, or, and, xor, left shift, right shift)
- 2's complement
- ASCII, null-terminated strings
- how pointers work
- how floating-point numbers work
- what a compiler is
- what an executable file is
- what a system call is
- what a CPU is
- what a CPU architecture is
- what a CPU register is
- what the (call) stack is

If you're unfamiliar with x86-64 assembly, you should take a look at the instruction list below.

## principles

- as simple as possible

Bootstrapping a compiler is not an easy task, so we're trying to make it as easy
as possible. We don't even necessarily need a standard-compliant C compiler, we
only need enough to compile someone else's C compiler. Specifically, we'll be
using [tcc](https://bellard.org/tcc/) since it's written (mostly) in C89.

- efficiency is not a concern

We will create big and slow executables, and that's okay. It doesn't really
matter if compiling TCC takes 30 as opposed to 0.01 seconds; once
we compile it with itself, we should get the same executable either way.

## reflections on trusting trust

In 1984, Ken Thompson wrote the well-known article
[Reflections on Trusting Trust](http://users.ece.cmu.edu/~ganger/712.fall02/papers/p761-thompson.pdf).
This is one of the inspirations for this project. A brief summary is:
it's possible to create a malicious C compiler which will
replicate its own malicious functionalities (e.g. detecting password-checking
routines to make them also accept another password the attacker knows) when used
to compile other C compilers. For all we know, such a compiler was used to
compile gcc, say, and so all programs around today could be compromised. Of
course, this is practically definitely not the case, but it's still an
interesting experiment to try to create a fully trustable compiler.  This
project can't necessarily even do that though, because the Linux kernel, which
we depend on, is compiled from C, so we can't fully trust *it*. To
create a *fully* trustable compiler, you'd need to manually write 
an operating system to a USB key with a circuit or something,
assuming you trust your CPU...
I'll leave that to someone else.

## instruction set

x86-64 has a *gigantic* instruction set. The manual for it is over 2,000 pages
long! To make things simpler, we will only use a small subset.

Here are all the instructions we'll be using. If you're not familiar with
x86-64 assembly, you might want to look over these.

x86-64 has 16 integer registers: rax, rbx, rcx, rdx, rsp, rbp, rsi, rdi, r8, r9, r10, r11, r12, r13, r14, r15.
We will almost entirely be using the first 8 of these.
al refers to the bottom 8 bits of rax, likewise with bl, cl, dl;
ax refers to the bottom 16 bits of rax, likewise with bx, cx, dx;
eax refers to the bottom 32 bits of rax, likewise with ebx, ecx, edx.

x86-64 also has 16 floating-point registers: xmm0 through xmm15. We'll only be using
xmm0 and xmm1. These registers can hold either four 32-bit floating-point numbers (`float`s) or
two 64-bit floating-point numbers (`double`s), but we'll only be using them to hold either one
`float` or one `double`.

In the table below, `IMM64` means a 64-bit *immediate* (a constant number).
`rdx:rax` refers to the 128-bit number you get by combining `rdx` and `rax`.

```
ax  bx  cx  dx  sp  bp  si  di
0   3   1   2   4   5   6   7

┌──────────────────────┬───────────────────┬────────────────────────────────────────┐
│ Instruction          │ Encoding          │ Description                            │
├──────────────────────┼───────────────────┼────────────────────────────────────────┤
│ mov rax, IMM64       │ 48 b8 IMM64       │ set rax to the 64-bit value IMM64      │
│ mov rbx, IMM64       │ 48 bb IMM64       │ set rbx to the 64-bit value IMM64      │
| add rax, IMM32       | 48 05 IMM32       | add IMM32 (signed) to rax              |
│ xor eax, eax         │ 31 c0             │ set rax to 0 (shorter than mov rax, 0) │
│ xor edx, edx         │ 31 d2             │ set rdx to 0                           │
│ mov RDEST, RSRC      │ 48 89 (DEST|SRC<<3|0xc0) │ set register DEST to current    │
│                      │                          │ value of register SRC           │
│ mov r8, rax          │ 49 89 c0          │ set r8 to rax (only used for syscalls) │
│ mov r9, rax          │ 49 89 c1          │ set r9 to rax (only used for syscalls) │
│ mov r10, rax         │ 49 89 c2          │ set r10 to rax (only used for syscalls)│
| movsx rax, al        | 48 0f be c0       | sign-extend al to rax                  |
| movsx rax, ax        | 48 0f bf c0       | sign-extend ax to rax                  |
| movsx rax, eax       | 48 63 c0          | sign-extend eax to rax                 |
| movzx rax, al        | 48 0f b6 c0       | zero-extend al to rax                  |
| movzx rax, ax        | 48 0f b7 c0       | zero-extend ax to rax                  |
| mov eax, eax         | 89 c0             | zero-extend eax to rax                 |
│ xchg rax, rbx        │ 48 93             │ exchange the values of rax and rbx     │
│ mov [rbx], rax       │ 48 89 03          │ store rax as 8 bytes at address rbx    │
│ mov rax, [rbx]       │ 48 8b 03          │ load 8 bytes from address rbx into rax │
│ mov [rbx], eax       │ 89 03             │ store eax as 4 bytes at address rbx    │
│ mov eax, [rbx]       │ 8b 03             │ load 4 bytes from address rbx into eax │
│ mov [rbx], ax        │ 66 89 03          │ store ax as 2 bytes at address rbx     │
│ mov ax, [rbx]        │ 66 8b 03          │ load 2 bytes from address rbx into eax │
│ mov [rbx], al        │ 88 03             │ store al as 1 byte at address rbx      │
│ mov al, [rbx]        │ 8a 03             │ load 1 byte from address rbx into al   │
│ mov rax, [rbp+IMM32] │ 48 8b 85 IMM32    │ load 8 bytes from address rbp+IMM32    │
│                      │                   │ into rax (note: IMM32 may be negative) │
│ mov rax, [rsp+IMM32] │ 48 8b 84 24 IMM32 │ load 8 bytes from rsp+IMM32 into rax   │
│ mov [rbp+IMM32], rax │ 48 89 85 IMM32    │ store rax in 8 bytes at rbp+IMM32      │
│ mov [rsp+IMM32], rax │ 48 89 84 24 IMM32 │ store rax in 8 bytes at rsp+IMM32      │
│ mov [rsp], rbp       │ 48 89 2c 24       │ store rbp in 8 bytes at rsp            │
│ mov rbp, [rsp]       │ 48 8b 2c 24       │ load 8 bytes from rsp into rbp         │
│ lea rax, [rbp+IMM32] │ 48 8d 85 IMM32    │ set rax to rbp+IMM32                   │
│ lea rsp, [rbp+IMM32] │ 48 8d a5 IMM32    │ set rsp to rbp+IMM32                   │
| int3                 | cc                | raise trap signal -useful for debugging|
| movsq                | 48 a5             | copy 8 bytes from rsi to rdi           |
| rep movsb            | f3 a4             | copy rcx bytes from rsi to rdi         |
│ push rax             │ 50                │ push rax onto the stack                │
│ pop rax              │ 58                │ pop a value off the stack into rax     │
│ neg rax              │ 48 f7 d8          │ set rax to -rax                        │
│ add rax, rbx         │ 48 01 d8          │ add rbx to rax                         │
│ sub rax, rbx         │ 48 29 d8          │ subtract rbx from rax                  │
│ imul rbx             │ 48 f7 eb          │ set rdx:rax to rax * rbx (signed)      │
│ cqo                  │ 48 99             │ sign-extend rax to rdx:rax             |
│ idiv rbx             │ 48 f7 fb          │ divide rdx:rax by rbx (signed); put    │
│                      │                   │    quotient in rax, remainder in rdx   │
│ mul rbx              │ 48 f7 e3          │ like imul, but unsigned                │
│ div rbx              │ 48 f7 f3          │ like idiv, but unsigned                │
│ not rax              │ 48 f7 d0          │ set rax to ~rax (bitwise not)          │
│ and rax, rbx         │ 48 21 d8          │ set rax to rax & rbx (bitwise and)     │
│ or rax, rbx          │ 48 09 d8          │ set rax to rax | rbx (bitwise or)      │
│ xor rax, rbx         │ 48 31 d8          │ set rax to rax ^ rbx (bitwise xor)     │
│ shl rax, cl          │ 48 d3 e0          │ set rax to rax << cl (left shift)      │
│ shl rax, IMM8        │ 48 c1 e0 IMM8     │ set rax to rax << IMM8                 │
│ shr rax, cl          │ 48 d3 e8          │ set rax to rax >> cl (unsigned)        │
│ shr rax, IMM8        │ 48 c1 e8 IMM8     │ set rax to rax >> IMM8 (unsigned)      │
│ sar rax, cl          │ 48 d3 f8          │ set rax to rax >> cl (signed)          │
│ sar rax, IMM8        │ 48 c1 f8 IMM8     │ set rax to rax >> IMM8 (signed)        │
│ sub rsp, IMM32       │ 48 81 ec IMM32    │ subtract IMM32 from rsp                │
│ add rsp, IMM32       │ 48 81 c4 IMM32    │ add IMM32 to rsp                       │
│ cmp rax, rbx         │ 48 39 d8          │ compare rax with rbx (see je, jl, etc.)│
│ test rax, rax        │ 48 85 c0          │ equivalent to cmp rax, 0               │
│ jmp IMM32            │ e9 IMM32          │ jump to offset IMM32 from here         │
│ je IMM32             │ 0f 84 IMM32       │ jump to IMM32 if equal                 │
│ jne IMM32            │ 0f 85 IMM32       │ jump if not equal                      │
│ jl IMM32             │ 0f 8c IMM32       │ jump if less than                      │
│ jg IMM32             │ 0f 8f IMM32       │ jump if greater than                   │
│ jle IMM32            │ 0f 8e IMM32       │ jump if less than or equal to          │
│ jge IMM32            │ 0f 8d IMM32       │ jump if greater than or equal to       │
│ jb IMM32             │ 0f 82 IMM32       │ jump if "below" (like jl but unsigned) │
│ ja IMM32             │ 0f 87 IMM32       │ jump if "above" (like jg but unsigned) │
│ jbe IMM32            │ 0f 86 IMM32       │ jump if below or equal to              │
│ jae IMM32            │ 0f 83 IMM32       │ jump if above or equal to              │
│ sete al              │ 0f 94 c0          │ set al to 1 if equal; 0 otherwise      │
│ setne al             │ 0f 95 c0          │ set al to 1 if not equal               │
│ setl al              │ 0f 9c c0          │ set al to 1 if less than               │
│ setg al              │ 0f 9f c0          │ set al to 1 if greater than            │
│ setle al             │ 0f 9e c0          │ set al to 1 if less than or equal to   │
│ setge al             │ 0f 9d c0          │ set al to 1 if greater than or equal to│
│ setb al              │ 0f 92 c0          │ set al to 1 if below                   │
│ seta al              │ 0f 97 c0          │ set al to 1 if above                   │
│ setbe al             │ 0f 96 c0          │ set al to 1 if below or equal to       │
│ setae al             │ 0f 93 c0          │ set al to 1 if above or equal to       │
| movq rax, xmm0       | 66 48 0f 7e c0    | set rax to xmm0                        |
| movq xmm0, rax       | 66 48 0f 6e c0    | set xmm0 to rax                        |
| movq xmm1, rax       | 66 48 0f 6e c8    | set xmm1 to rax                        |
| movq xmm1, xmm0      | f3 0f 7e c8       | set xmm1 to xmm0                       |
| cvtss2sd xmm0, xmm0  | f3 0f 5a c0       | convert xmm0 from float to double      |
| cvtsd2ss xmm0, xmm0  | f2 0f 5a c0       | convert xmm0 from double to float      |
| cvttsd2si rax, xmm0  | f2 48 0f 2c c0    | convert double in xmm0 to int in rax   |
| cvtsi2sd xmm0, rax   | f2 48 0f 2a c0    | convert int in rax to double in xmm0   |
| comisd xmm0, xmm1    | 66 0f 2f c1       | compare xmm0 and xmm1                  |
| addsd xmm0, xmm1     | f2 0f 58 c1       | add xmm1 to xmm0                       |
| subsd xmm0, xmm1     | f2 0f 5c c1       | subtract xmm1 from xmm0                |
| mulsd xmm0, xmm1     | f2 0f 59 c1       | multiply xmm0 by xmm1                  |
| divsd xmm0, xmm1     | f2 0f 5e c1       | divide xmm0 by xmm1                    |
│ call rax             │ ff d0             │ call the function at address rax       │
│ ret                  │ c3                │ return from function                   │
│ syscall              │ 0f 05             │ execute a system call                  │
│ nop                  │ 90                │ do nothing                             │
└──────────────────────┴───────────────────┴────────────────────────────────────────┘

SYSCALLS
Arguments are passed in
	rdi, rsi, rdx, r10, r8, r9
The return value is placed in rax.
The values of rsp, rbp and rbx are preserved, but other registers might change.
```

## license

This does not apply to tcc's or musl's source code.

```
This project is in the public domain. Any copyright protections from any law
are forfeited by the author(s). No warranty is provided, and the author(s)
shall not be held liable in connection with it.
```

## contributing

If you notice a mistake/want to clarify something, you can submit a pull request
via GitHub, or email `pommicket at pommicket.com`.
stage 00 readme done 2021-08-31 02:10:17 -04:00			`# boostrapping a (Linux x86-64) C compiler`

			`Compilers nowadays are written in languages like C, which themselves need to be`
			`compiled. But then, you need a C compiler to compile your C compiler! Of course,`
edit readmes 2022-02-23 23:50:49 -08:00			`the very first C compiler was not written in C.`
			`First, people made assemblers, then simple programming languages,`
			`then, eventually, it was possible to make a C compiler.`
			`In this repository, we'll explore how that's done. Each directory here`
			is a "stage" in the process. The first one, `00`, is a hand-written
			executable, and the last one, `05`, is a C compiler. Each directory has its own
stage 00 readme done 2021-08-31 02:10:17 -04:00			`README explaining what's going on.`

			You can run `bootstrap.sh` to run through and test every stage.
markdown to HTML converter 2021-09-01 18:27:51 -04:00			To get HTML versions of all README pages, run `make`.
stage 00 readme done 2021-08-31 02:10:17 -04:00
readme edits 2021-11-10 00:52:34 -05:00			`Note that the executables produced in this series will only run on`
			`64-bit Linux, because each OS/architecture combination would need its own separate`
			`executable.`

readme tweaks, mainly 2021-11-10 12:55:41 -05:00			`## table of contents`

			`- [stage 00](00/README.md) - a program converting a text file with`
			`hexadecimal digit pairs to a binary file.`
			`- [stage 01](01/README.md) - a language with comments, and 2-character`
			`command codes.`
			`- [stage 02](02/README.md) - a language with labels`
03 README 2021-11-14 00:33:40 -05:00			`- [stage 03](03/README.md) - a language with longer labels, better error messages, and less register manipulation`
rename 04b => 04, better 04 README 2022-01-07 11:07:06 -05:00			`- [stage 04](04/README.md) - a language with nice functions and local variables`
finished preprocessor 2022-01-07 14:31:52 -05:00			`- [stage 04a](04a/README.md) - (interlude) a simple preprocessor`
finish 05 2022-02-19 19:43:13 -08:00			`- [stage 05](05/README.md) - a C compiler capable of compiling TCC`
Update README.md 2023-07-17 10:13:10 +01:00			`- [stage 06](06/README.md) - an interpreter capable of executing zig`
readme edits 2021-11-10 00:52:34 -05:00
			`## prerequisite knowledge`
stage 00 readme done 2021-08-31 02:10:17 -04:00
comparison operators 2022-02-13 11:24:30 -05:00			`If you want to follow along with this series, you'll probably want to know about:`
stage 00 readme done 2021-08-31 02:10:17 -04:00
			`- number bases -- if a number is preceded by 0x, 0o, or 0b in this series, that`
			`means hexadecimal/octal/binary respectively. So 0xff = FF hexadecimal = 255`
			`decimal.`
			`- bits, bytes, kilobytes, etc.`
			`- bitwise operations (not, or, and, xor, left shift, right shift)`
			`- 2's complement`
readme edits 2021-11-10 00:52:34 -05:00			`- ASCII, null-terminated strings`
cleaned up 00 2021-08-31 17:16:30 -04:00			`- how pointers work`
stage 00 readme done 2021-08-31 02:10:17 -04:00			`- how floating-point numbers work`
edit readmes 2022-02-23 23:50:49 -08:00			`- what a compiler is`
			`- what an executable file is`
			`- what a system call is`
			`- what a CPU is`
			`- what a CPU architecture is`
			`- what a CPU register is`
			`- what the (call) stack is`
stage 00 readme done 2021-08-31 02:10:17 -04:00
edit readmes 2022-02-23 23:50:49 -08:00			`If you're unfamiliar with x86-64 assembly, you should take a look at the instruction list below.`
stage 00 readme done 2021-08-31 02:10:17 -04:00
			`## principles`

			`- as simple as possible`

			`Bootstrapping a compiler is not an easy task, so we're trying to make it as easy`
			`as possible. We don't even necessarily need a standard-compliant C compiler, we`
finish 05 2022-02-19 19:43:13 -08:00			`only need enough to compile someone else's C compiler. Specifically, we'll be`
edit readmes 2022-02-23 23:50:49 -08:00			`using [tcc](https://bellard.org/tcc/) since it's written (mostly) in C89.`
stage 00 readme done 2021-08-31 02:10:17 -04:00
			`- efficiency is not a concern`

			`We will create big and slow executables, and that's okay. It doesn't really`
edit readmes 2022-02-23 23:50:49 -08:00			`matter if compiling TCC takes 30 as opposed to 0.01 seconds; once`
			`we compile it with itself, we should get the same executable either way.`
stage 00 readme done 2021-08-31 02:10:17 -04:00
			`## reflections on trusting trust`

			`In 1984, Ken Thompson wrote the well-known article`
markdown to HTML converter 2021-09-01 18:27:51 -04:00			`[Reflections on Trusting Trust](http://users.ece.cmu.edu/~ganger/712.fall02/papers/p761-thompson.pdf).`
finish 05 2022-02-19 19:43:13 -08:00			`This is one of the inspirations for this project. A brief summary is:`
			`it's possible to create a malicious C compiler which will`
stage 00 readme done 2021-08-31 02:10:17 -04:00			`replicate its own malicious functionalities (e.g. detecting password-checking`
			`routines to make them also accept another password the attacker knows) when used`
			`to compile other C compilers. For all we know, such a compiler was used to`
edit readmes 2022-02-23 23:50:49 -08:00			`compile gcc, say, and so all programs around today could be compromised. Of`
stage 00 readme done 2021-08-31 02:10:17 -04:00			`course, this is practically definitely not the case, but it's still an`
			`interesting experiment to try to create a fully trustable compiler. This`
			`project can't necessarily even do that though, because the Linux kernel, which`
03 README 2021-11-14 00:33:40 -05:00			`we depend on, is compiled from C, so we can't fully trust it. To`
			`create a fully trustable compiler, you'd need to manually write`
			`an operating system to a USB key with a circuit or something,`
			`assuming you trust your CPU...`
readme tweaks, mainly 2021-11-10 12:55:41 -05:00			`I'll leave that to someone else.`
stage 00 readme done 2021-08-31 02:10:17 -04:00
instruction table, remove old instructions 2022-01-07 20:30:29 -05:00			`## instruction set`

			`x86-64 has a gigantic instruction set. The manual for it is over 2,000 pages`
switch to using mmap for output file 2022-01-27 18:52:39 -05:00			`long! To make things simpler, we will only use a small subset.`
instruction table, remove old instructions 2022-01-07 20:30:29 -05:00
			`Here are all the instructions we'll be using. If you're not familiar with`
comparison operators 2022-02-13 11:24:30 -05:00			`x86-64 assembly, you might want to look over these.`

			`x86-64 has 16 integer registers: rax, rbx, rcx, rdx, rsp, rbp, rsi, rdi, r8, r9, r10, r11, r12, r13, r14, r15.`
			`We will almost entirely be using the first 8 of these.`
			`al refers to the bottom 8 bits of rax, likewise with bl, cl, dl;`
			`ax refers to the bottom 16 bits of rax, likewise with bx, cx, dx;`
			`eax refers to the bottom 32 bits of rax, likewise with ebx, ecx, edx.`

			`x86-64 also has 16 floating-point registers: xmm0 through xmm15. We'll only be using`
			xmm0 and xmm1. These registers can hold either four 32-bit floating-point numbers (`float`s) or
			two 64-bit floating-point numbers (`double`s), but we'll only be using them to hold either one
			`float` or one `double`.
instruction table, remove old instructions 2022-01-07 20:30:29 -05:00
			In the table below, `IMM64` means a 64-bit immediate (a constant number).
			`rdx:rax` refers to the 128-bit number you get by combining `rdx` and `rax`.

			```
start codegen 2022-02-09 22:44:27 -05:00			`ax bx cx dx sp bp si di`
			`0 3 1 2 4 5 6 7`

instruction table, remove old instructions 2022-01-07 20:30:29 -05:00			`┌──────────────────────┬───────────────────┬────────────────────────────────────────┐`
			`│ Instruction │ Encoding │ Description │`
			`├──────────────────────┼───────────────────┼────────────────────────────────────────┤`
			`│ mov rax, IMM64 │ 48 b8 IMM64 │ set rax to the 64-bit value IMM64 │`
addition 2022-02-11 14:34:54 -05:00			`│ mov rbx, IMM64 │ 48 bb IMM64 │ set rbx to the 64-bit value IMM64 │`
function calls mostly working 2022-02-12 21:27:57 -05:00			`\| add rax, IMM32 \| 48 05 IMM32 \| add IMM32 (signed) to rax \|`
instruction table, remove old instructions 2022-01-07 20:30:29 -05:00			`│ xor eax, eax │ 31 c0 │ set rax to 0 (shorter than mov rax, 0) │`
			`│ xor edx, edx │ 31 d2 │ set rdx to 0 │`
			`│ mov RDEST, RSRC │ 48 89 (DEST\|SRC<<3\|0xc0) │ set register DEST to current │`
			`│ │ │ value of register SRC │`
			`│ mov r8, rax │ 49 89 c0 │ set r8 to rax (only used for syscalls) │`
			`│ mov r9, rax │ 49 89 c1 │ set r9 to rax (only used for syscalls) │`
			`│ mov r10, rax │ 49 89 c2 │ set r10 to rax (only used for syscalls)│`
generating code for casts! 2022-02-10 21:09:52 -05:00			`\| movsx rax, al \| 48 0f be c0 \| sign-extend al to rax \|`
			`\| movsx rax, ax \| 48 0f bf c0 \| sign-extend ax to rax \|`
			`\| movsx rax, eax \| 48 63 c0 \| sign-extend eax to rax \|`
			`\| movzx rax, al \| 48 0f b6 c0 \| zero-extend al to rax \|`
			`\| movzx rax, ax \| 48 0f b7 c0 \| zero-extend ax to rax \|`
			`\| mov eax, eax \| 89 c0 \| zero-extend eax to rax \|`
instruction table, remove old instructions 2022-01-07 20:30:29 -05:00			`│ xchg rax, rbx │ 48 93 │ exchange the values of rax and rbx │`
			`│ mov [rbx], rax │ 48 89 03 │ store rax as 8 bytes at address rbx │`
			`│ mov rax, [rbx] │ 48 8b 03 │ load 8 bytes from address rbx into rax │`
			`│ mov [rbx], eax │ 89 03 │ store eax as 4 bytes at address rbx │`
			`│ mov eax, [rbx] │ 8b 03 │ load 4 bytes from address rbx into eax │`
			`│ mov [rbx], ax │ 66 89 03 │ store ax as 2 bytes at address rbx │`
			`│ mov ax, [rbx] │ 66 8b 03 │ load 2 bytes from address rbx into eax │`
			`│ mov [rbx], al │ 88 03 │ store al as 1 byte at address rbx │`
return, integer literals 2022-02-10 18:09:32 -05:00			`│ mov al, [rbx] │ 8a 03 │ load 1 byte from address rbx into al │`
instruction table, remove old instructions 2022-01-07 20:30:29 -05:00			`│ mov rax, [rbp+IMM32] │ 48 8b 85 IMM32 │ load 8 bytes from address rbp+IMM32 │`
			`│ │ │ into rax (note: IMM32 may be negative) │`
edit readmes 2022-02-23 23:50:49 -08:00			`│ mov rax, [rsp+IMM32] │ 48 8b 84 24 IMM32 │ load 8 bytes from rsp+IMM32 into rax │`
instruction table, remove old instructions 2022-01-07 20:30:29 -05:00			`│ mov [rbp+IMM32], rax │ 48 89 85 IMM32 │ store rax in 8 bytes at rbp+IMM32 │`
			`│ mov [rsp+IMM32], rax │ 48 89 84 24 IMM32 │ store rax in 8 bytes at rsp+IMM32 │`
			`│ mov [rsp], rbp │ 48 89 2c 24 │ store rbp in 8 bytes at rsp │`
			`│ mov rbp, [rsp] │ 48 8b 2c 24 │ load 8 bytes from rsp into rbp │`
first working executable! 2022-02-10 16:06:17 -05:00			`│ lea rax, [rbp+IMM32] │ 48 8d 85 IMM32 │ set rax to rbp+IMM32 │`
			`│ lea rsp, [rbp+IMM32] │ 48 8d a5 IMM32 │ set rsp to rbp+IMM32 │`
cleaned up comments 2022-02-27 15:31:02 -05:00			`\| int3 \| cc \| raise trap signal -useful for debugging\|`
return, integer literals 2022-02-10 18:09:32 -05:00			`\| movsq \| 48 a5 \| copy 8 bytes from rsi to rdi \|`
			`\| rep movsb \| f3 a4 \| copy rcx bytes from rsi to rdi \|`
			`│ push rax │ 50 │ push rax onto the stack │`
codegen for . -> , 2022-02-12 14:31:14 -05:00			`│ pop rax │ 58 │ pop a value off the stack into rax │`
instruction table, remove old instructions 2022-01-07 20:30:29 -05:00			`│ neg rax │ 48 f7 d8 │ set rax to -rax │`
			`│ add rax, rbx │ 48 01 d8 │ add rbx to rax │`
			`│ sub rax, rbx │ 48 29 d8 │ subtract rbx from rax │`
			`│ imul rbx │ 48 f7 eb │ set rdx:rax to rax * rbx (signed) │`
more consterxprs 2022-01-21 23:24:18 -05:00			`│ cqo │ 48 99 │ sign-extend rax to rdx:rax \|`
instruction table, remove old instructions 2022-01-07 20:30:29 -05:00			`│ idiv rbx │ 48 f7 fb │ divide rdx:rax by rbx (signed); put │`
edit readmes 2022-02-23 23:50:49 -08:00			`│ │ │ quotient in rax, remainder in rdx │`
instruction table, remove old instructions 2022-01-07 20:30:29 -05:00			`│ mul rbx │ 48 f7 e3 │ like imul, but unsigned │`
edit readmes 2022-02-23 23:50:49 -08:00			`│ div rbx │ 48 f7 f3 │ like idiv, but unsigned │`
instruction table, remove old instructions 2022-01-07 20:30:29 -05:00			`│ not rax │ 48 f7 d0 │ set rax to ~rax (bitwise not) │`
			`│ and rax, rbx │ 48 21 d8 │ set rax to rax & rbx (bitwise and) │`
			`│ or rax, rbx │ 48 09 d8 │ set rax to rax \| rbx (bitwise or) │`
			`│ xor rax, rbx │ 48 31 d8 │ set rax to rax ^ rbx (bitwise xor) │`
			`│ shl rax, cl │ 48 d3 e0 │ set rax to rax << cl (left shift) │`
			`│ shl rax, IMM8 │ 48 c1 e0 IMM8 │ set rax to rax << IMM8 │`
edit readmes 2022-02-23 23:50:49 -08:00			`│ shr rax, cl │ 48 d3 e8 │ set rax to rax >> cl (unsigned) │`
			`│ shr rax, IMM8 │ 48 c1 e8 IMM8 │ set rax to rax >> IMM8 (unsigned) │`
			`│ sar rax, cl │ 48 d3 f8 │ set rax to rax >> cl (signed) │`
			`│ sar rax, IMM8 │ 48 c1 f8 IMM8 │ set rax to rax >> IMM8 (signed) │`
instruction table, remove old instructions 2022-01-07 20:30:29 -05:00			`│ sub rsp, IMM32 │ 48 81 ec IMM32 │ subtract IMM32 from rsp │`
			`│ add rsp, IMM32 │ 48 81 c4 IMM32 │ add IMM32 to rsp │`
			`│ cmp rax, rbx │ 48 39 d8 │ compare rax with rbx (see je, jl, etc.)│`
			`│ test rax, rax │ 48 85 c0 │ equivalent to cmp rax, 0 │`
			`│ jmp IMM32 │ e9 IMM32 │ jump to offset IMM32 from here │`
			`│ je IMM32 │ 0f 84 IMM32 │ jump to IMM32 if equal │`
			`│ jne IMM32 │ 0f 85 IMM32 │ jump if not equal │`
			`│ jl IMM32 │ 0f 8c IMM32 │ jump if less than │`
			`│ jg IMM32 │ 0f 8f IMM32 │ jump if greater than │`
			`│ jle IMM32 │ 0f 8e IMM32 │ jump if less than or equal to │`
			`│ jge IMM32 │ 0f 8d IMM32 │ jump if greater than or equal to │`
			`│ jb IMM32 │ 0f 82 IMM32 │ jump if "below" (like jl but unsigned) │`
			`│ ja IMM32 │ 0f 87 IMM32 │ jump if "above" (like jg but unsigned) │`
			`│ jbe IMM32 │ 0f 86 IMM32 │ jump if below or equal to │`
			`│ jae IMM32 │ 0f 83 IMM32 │ jump if above or equal to │`
comparison operators 2022-02-13 11:24:30 -05:00			`│ sete al │ 0f 94 c0 │ set al to 1 if equal; 0 otherwise │`
			`│ setne al │ 0f 95 c0 │ set al to 1 if not equal │`
			`│ setl al │ 0f 9c c0 │ set al to 1 if less than │`
			`│ setg al │ 0f 9f c0 │ set al to 1 if greater than │`
			`│ setle al │ 0f 9e c0 │ set al to 1 if less than or equal to │`
			`│ setge al │ 0f 9d c0 │ set al to 1 if greater than or equal to│`
			`│ setb al │ 0f 92 c0 │ set al to 1 if below │`
			`│ seta al │ 0f 97 c0 │ set al to 1 if above │`
			`│ setbe al │ 0f 96 c0 │ set al to 1 if below or equal to │`
			`│ setae al │ 0f 93 c0 │ set al to 1 if above or equal to │`
fixed prefix -- parsing; codegen for unary + - ~ ! 2022-02-11 13:52:19 -05:00			`\| movq rax, xmm0 \| 66 48 0f 7e c0 \| set rax to xmm0 \|`
			`\| movq xmm0, rax \| 66 48 0f 6e c0 \| set xmm0 to rax \|`
			`\| movq xmm1, rax \| 66 48 0f 6e c8 \| set xmm1 to rax \|`
addition 2022-02-11 14:34:54 -05:00			`\| movq xmm1, xmm0 \| f3 0f 7e c8 \| set xmm1 to xmm0 \|`
generating code for casts! 2022-02-10 21:09:52 -05:00			`\| cvtss2sd xmm0, xmm0 \| f3 0f 5a c0 \| convert xmm0 from float to double \|`
			`\| cvtsd2ss xmm0, xmm0 \| f2 0f 5a c0 \| convert xmm0 from double to float \|`
			`\| cvttsd2si rax, xmm0 \| f2 48 0f 2c c0 \| convert double in xmm0 to int in rax \|`
			`\| cvtsi2sd xmm0, rax \| f2 48 0f 2a c0 \| convert int in rax to double in xmm0 \|`
fixed prefix -- parsing; codegen for unary + - ~ ! 2022-02-11 13:52:19 -05:00			`\| comisd xmm0, xmm1 \| 66 0f 2f c1 \| compare xmm0 and xmm1 \|`
addition 2022-02-11 14:34:54 -05:00			`\| addsd xmm0, xmm1 \| f2 0f 58 c1 \| add xmm1 to xmm0 \|`
			`\| subsd xmm0, xmm1 \| f2 0f 5c c1 \| subtract xmm1 from xmm0 \|`
multiplication and division 2022-02-12 16:57:40 -05:00			`\| mulsd xmm0, xmm1 \| f2 0f 59 c1 \| multiply xmm0 by xmm1 \|`
			`\| divsd xmm0, xmm1 \| f2 0f 5e c1 \| divide xmm0 by xmm1 \|`
instruction table, remove old instructions 2022-01-07 20:30:29 -05:00			`│ call rax │ ff d0 │ call the function at address rax │`
			`│ ret │ c3 │ return from function │`
			`│ syscall │ 0f 05 │ execute a system call │`
			`│ nop │ 90 │ do nothing │`
			`└──────────────────────┴───────────────────┴────────────────────────────────────────┘`
first C hello world! 2022-02-13 15:07:26 -05:00
			`SYSCALLS`
			`Arguments are passed in`
			`rdi, rsi, rdx, r10, r8, r9`
			`The return value is placed in rax.`
cleaned up comments 2022-02-27 15:31:02 -05:00			`The values of rsp, rbp and rbx are preserved, but other registers might change.`
instruction table, remove old instructions 2022-01-07 20:30:29 -05:00			```

stage 00 readme done 2021-08-31 02:10:17 -04:00			`## license`

edit readmes 2022-02-23 23:50:49 -08:00			`This does not apply to tcc's or musl's source code.`
finish 05 2022-02-19 19:43:13 -08:00
stage 00 readme done 2021-08-31 02:10:17 -04:00			```
			`This project is in the public domain. Any copyright protections from any law`
readme tweaks, mainly 2021-11-10 12:55:41 -05:00			`are forfeited by the author(s). No warranty is provided, and the author(s)`
			`shall not be held liable in connection with it.`
stage 00 readme done 2021-08-31 02:10:17 -04:00			```

			`## contributing`

			`If you notice a mistake/want to clarify something, you can submit a pull request`
rename 04b => 04, better 04 README 2022-01-07 11:07:06 -05:00			via GitHub, or email `pommicket at pommicket.com`.