finish 05

This commit is contained in:
pommicket 2022-02-19 19:43:13 -08:00
parent 54a191a117
commit a8c884e6cd
6 changed files with 747 additions and 35 deletions

View file

@ -9,7 +9,7 @@ $ make
```
to build our C compiler and TCC. This will take some time (approx. 25 seconds on my computer).
This also compiles a "Hello, world!" with our compiler, `a.out`.
This also compiles a "Hello, world!" executable, `a.out`, with our compiler.
We can now compile TCC with itself. But first, you'll need to install the header files and library files
which are needed to compile (almost) any program with TCC:
@ -20,8 +20,8 @@ $ sudo make install-tcc0
The files will be installed to `/usr/local/lib/tcc-bootstrap`. If you want to change this, make sure to change
both the `TCCINST` variable in the makefile, and the `CONFIG_TCCDIR` macro in `config.h`.
Anyways, once this installation is done, you should be able to compile any C program with `tcc-0.9.27/tcc0`!
We can even compile TCC with itself:
Anyways, once this installation is done, you should be able to compile any C program with `tcc-0.9.27/tcc0`,
including TCC itself:
```
$ cd tcc-0.9.27
@ -44,7 +44,7 @@ $ diff tcc1 tcc1a
Binary files tcc1 and tcc1a differ
```
!!! Is there some malicious code hiding in the difference between these two files? Well, unfortunately (or fortunately, rather) the
!!! Is there some malicious code hiding in the difference between these two files? Well unfortunately (fortunately, really) the
truth is more boring than that:
```
@ -267,48 +267,67 @@ our header files use, and then we define each of the necessary C standard librar
## limitations
There are various minor ways in which this compiler doesn't actually handle all of C89.
Here is a list of things we do wrong (this list is probably missing things, though):
Here is a (probably incomplete) list of things we do wrong:
- [trigraphs](https://en.wikipedia.org/wiki/Digraphs_and_trigraphs#C) are not handled
- `char[]` string literal initializers can't contain null characters (e.g. `char x[] = "a\0b";` doesn't work)
- you can only access members of l-values (e.g. `int x = function_which_returns_struct().member` doesn't work)
- you can only access members of l-values (e.g. `int x = function_which_returns_struct().member;` doesn't work)
- no default-int (this is a legacy feature of C, e.g. `main() {}` can technically stand in for `int main() {}`)
- the keyword `auto` is not handled (again, a legacy feature of C)
- `default:` must be the last label in a switch statement.
- external variable declarations are ignored (e.g. `extern int x; int main() { return x; } int x = 5; ` doesn't work)
- `typedef`s, and `struct`/`union`/`enum` declarations aren't allowed inside functions
- `default:` must come after all `case` labels in a switch statement.
- external variable declarations are ignored, and global variables can only be declared once
(e.g. `extern int x; int main() { return x; } int x = 5; ` doesn't work)
- `typedef`s, and `struct`/`union`/`enum` definitions aren't allowed inside functions
- conditional expressions aren't allowed inside `case` (horribly, `switch (x) { case 5 ? 6 : 3: ; }` is legal C).
- bit-fields aren't handled
- Technically, `1[array]` is equivalent to `array[1]`, but we don't handle that.
- C89 has *very* weird typing rules about `void*`/`non-void*` inside conditional expressions. We don't handle that properly.
- C89 allows calling functions without declaring them, for legacy reasons. We don't handle that.
- Floating-point constant expressions are very limited. Only `double` literals and 0 are supported.
- Floating-point literals can't have their integer part greater than 2<sup>64</sup>-1.
- In floating-point literals, the numbers before and after the decimal point must be less than 2<sup>64</sup>.
- The only "address constants" we allow are string literals, e.g. `int y, x = &y;` is not allowed as a global declaration.
- Redefining a macro is always an error, even if it's the same definition.
- You can't have a variable/function/etc. called `defined`.
- Various little things about when macros are evaluated in some contexts.
- The horrible, horrible, function `setjmp`, which surely no one uses is not properly supported.
- The horrible, horrible function `setjmp`, which surely no one uses, is not properly supported.
Oh wait, TCC uses it. Fortunately it's not critically important to TCC.
- `wchar_t` and wide character string literals are not supported.
- Wide characters and wide character strings are not supported.
- The `localtime()` function assumes you are in the UTC+0 timezone.
- `mktime()` always fails.
Also, the keywords `signed`, `volatile`, `register`, and `const` are all ignored. This shouldn't have an effect
on any legal C program, though.
- The keywords `signed`, `volatile`, `register`, and `const` are all ignored, but this should almost never
have an effect on a legal C program.
## anecdotes
Making this C compiler took over a month. Here are some interesting things
which happened along the way:
- A very difficult part of this compiler was parsing floating-point numbers in a language which
doesn't have floats. Originally, there was a bug where negative powers of 2 were
- Writing code to parse floating-point numbers in a language which
doesn't have floats turned out to be quite a fun challenge!
Not all decimal numbers have a perfect floating point representation. You could
round 0.1 up to ~0.1000000000000000056, or down to ~0.0999999999999999917.
This stage's C compiler should be entirely correct, up to rounding (which is all that the
C standard requires).
But typically C compilers
will round to whichever is closest to the decimal value. Implementing this correctly
is a lot harder than you might expect. For example,
```
0.09999999999999999861222121921855432447046041488647460937499
rounds down, but
0.09999999999999999861222121921855432447046041488647460937501
rounds up.
```
Good luck writing a function which handles that!
- Originally, there was a bug where negative powers of 2 were
being interpreted as half of their actual value, e.g. `x = 0.25;` would set `x` to
`0.125`, but `x = 4;`, `x = 0.3;`, etc. would all work just fine.
- Writing the functions in `math.h`, although probably not necessary for compiling TCC,
was fun! There are quite a few interesting optimizations you can make, and little
tricks for avoiding losses in floating-point accuracy.
- The <s>first</s> second non-trivial program I successfully compiled worked perfectly the first time I ran it!
- A very difficult to track down bug happened the first time I ran `tcc`: there was a declaration along
the lines of `char x[] = "a\0b\0c";` but it got compiled as `char x[] = "a";`!
- Originally, I was just treating labels as statements, but `tcc` actually has code like:
- Originally, I was just treating labels the same as any other statements, but `tcc` actually has code like:
```
...
goto lbl;
@ -318,7 +337,7 @@ if (some_condition)
```
so the `do_something();` was not being considered as part of the `if` statement.
- The first time I compiled tcc with itself (and then with itself again), I actually got a different
executable. After spending a long time looking at disassemblies, I found the culprit:
executable from the GCC one. After spending a long time looking at disassemblies, I found the culprit:
```
# if defined(__linux__)
tcc_define_symbol(s, "__linux__", NULL);
@ -332,6 +351,43 @@ with itself!
## modifications of tcc's source code
Some modifications were needed to bring tcc's source code in line with what our compiler expects.
You can find a full list of modifications in `diffs.txt`, but I'll provide an overview (and explanation)
here.
- First, we (and C89) don't allow a comma after the last member in an initializer. In several places,
the last comma in an initializer/enum definition was removed, or an irrelevant entry was added to the end.
- Global variables were sometimes declared twice, which we don't support.
So, a bunch of duplicate declarations were removed.
- The `# if defined(__linux__)` and `# endif` mentioned above were removed.
- In a bunch of places, `ELFW(something)` had to be replaced with `ELF64_something` due to
subtleties of how we evaluate macros.
- `offsetof(type, member)` isn't considered a constant expression by our compiler, so
some initializers were replaced by functions called at the top of `main`.
- In several places, `default:` had to be moved to after every `case` label.
- In two places, `-some_long_double_expression` had to be replaced with
a function call to `negate_ld` (a function I wrote for negating long doubles).
This is because TCC only supports negating long doubles if
the compiler used to compile it has an 80-bit long double type, which our compiler doesn't.
- `\0` was replaced with `\n` as a separator for keyword names.
- Forced TCC to use `R_X86_64_PC32` relocations, because its `plt` code doesn't seem to work for static
executables.
- Lastly, there's the `config.h` file, which is normally produced by TCC's `configure` script,
but it's easy to write one manually:
```
#define TCC_VERSION "0.9.27"
#define CONFIG_TCC_STATIC 1
#define TCC_TARGET_X86_64 1
#define ONE_SOURCE 1
#define CONFIG_LDDIR "lib/x86_64-linux-gnu"
#define CONFIG_TCCDIR "/usr/local/lib/tcc-bootstrap"
#define inline
```
The last line causes the `inline` keyword (added in C99) to be ignored.
Fewer changes would've been needed for an older version of TCC, but older versions didn't support
x86-64 assembly, which might end up being relevant...
## \*the nightmare begins
@ -345,12 +401,13 @@ if there a security bug were found in `printf`, it would be much easier to repla
every program which uses `printf`.
Now this library file is itself compiled from C source files (typically glibc).
So, we *can't* really say that the self-compiled TCC was built from scratch. And there could be malicious
So, we can't really say that the self-compiled TCC was built from scratch. And there could be malicious
self-replicating code in glibc!
So, why not just compile glibc with TCC?
Well, it's not actually possible. glibc can pretty much only be compiled with GCC. And we can't compile GCC
without a libc. Hmm...
Well, it's not actually possible. glibc can pretty much only be compiled with GCC.
This stage's C compiler definitely can't compile GCC, so we'll need a libc implementation to
compile GCC. Hmm...
Other libc implementations don't seem to like TCC either, so it seems that the only option left is to
make a new libc implementation, use that to compile GCC (probably an old version of it which TCC can compile),

647
05/diffs.txt Normal file
View file

@ -0,0 +1,647 @@
---- arm-asm.c ----
---- arm-gen.c ----
---- arm-link.c ----
---- arm64-gen.c ----
---- arm64-link.c ----
---- c67-gen.c ----
---- c67-link.c ----
---- conftest.c ----
---- i386-asm.c ----
209c209
< 0x0f, /* g */
---
> 0x0f /* g */
238c238
< { 0, },
---
> { 0 }
252a253,254
> /* last operation */
> 0
1576,1578d1577
< default:
< reg = TOK_ASM_eax + reg;
< break;
1583a1583,1585
> default:
> reg = TOK_ASM_eax + reg;
> break;
---- i386-gen.c ----
---- i386-link.c ----
---- il-gen.c ----
---- libtcc.c ----
27c27
< ST_DATA int gnu_ext = 1;
---
> //ST_DATA int gnu_ext = 1;
30c30
< ST_DATA int tcc_ext = 1;
---
> //ST_DATA int tcc_ext = 1;
33c33
< ST_DATA struct TCCState *tcc_state;
---
> //ST_DATA struct TCCState *tcc_state;
820c820
< # if defined(__linux__)
---
> //# if defined(__linux__)
823c823
< # endif
---
> //# endif
1177c1177
< ELFW(ST_INFO)(STB_GLOBAL, STT_NOTYPE), 0,
---
> ELF64_ST_INFO(STB_GLOBAL, STT_NOTYPE), 0,
1552c1552
< { NULL, 0, 0 },
---
> { NULL, 0, 0 }
1555c1555
< static const FlagDef options_W[] = {
---
> static FlagDef options_W[] = {
1557,1562c1557,1561
< { offsetof(TCCState, warn_unsupported), 0, "unsupported" },
< { offsetof(TCCState, warn_write_strings), 0, "write-strings" },
< { offsetof(TCCState, warn_error), 0, "error" },
< { offsetof(TCCState, warn_gcc_compat), 0, "gcc-compat" },
< { offsetof(TCCState, warn_implicit_function_declaration), WD_ALL,
< "implicit-function-declaration" },
---
> { 0, 0, "unsupported" },
> { 0, 0, "write-strings" },
> { 0, 0, "error" },
> { 0, 0, "gcc-compat" },
> { 0, WD_ALL, "implicit-function-declaration" },
1566,1572c1565,1571
< static const FlagDef options_f[] = {
< { offsetof(TCCState, char_is_unsigned), 0, "unsigned-char" },
< { offsetof(TCCState, char_is_unsigned), FD_INVERT, "signed-char" },
< { offsetof(TCCState, nocommon), FD_INVERT, "common" },
< { offsetof(TCCState, leading_underscore), 0, "leading-underscore" },
< { offsetof(TCCState, ms_extensions), 0, "ms-extensions" },
< { offsetof(TCCState, dollars_in_identifiers), 0, "dollars-in-identifiers" },
---
> static FlagDef options_f[] = {
> { 0, 0, "unsigned-char" },
> { 0, FD_INVERT, "signed-char" },
> { 0, FD_INVERT, "common" },
> { 0, 0, "leading-underscore" },
> { 0, 0, "ms-extensions" },
> { 0, 0, "dollars-in-identifiers" },
1576,1577c1575,1576
< static const FlagDef options_m[] = {
< { offsetof(TCCState, ms_bitfields), 0, "ms-bitfields" },
---
> static FlagDef options_m[] = {
> { 0, 0, "ms-bitfields" },
1579c1578
< { offsetof(TCCState, nosse), FD_INVERT, "sse" },
---
> { 0, FD_INVERT, "sse" },
1582a1582,1599
>
> void _init_options(void) {
> options_W[1].offset = offsetof(TCCState, warn_unsupported);
> options_W[2].offset = offsetof(TCCState, warn_write_strings);
> options_W[3].offset = offsetof(TCCState, warn_error);
> options_W[4].offset = offsetof(TCCState, warn_gcc_compat);
> options_W[5].offset = offsetof(TCCState, warn_implicit_function_declaration);
> options_f[0].offset = offsetof(TCCState, char_is_unsigned);
> options_f[1].offset = offsetof(TCCState, char_is_unsigned);
> options_f[2].offset = offsetof(TCCState, nocommon);
> options_f[3].offset = offsetof(TCCState, leading_underscore);
> options_f[4].offset = offsetof(TCCState, ms_extensions);
> options_f[5].offset = offsetof(TCCState, dollars_in_identifiers);
> options_m[0].offset = offsetof(TCCState, ms_bitfields);
> #ifdef TCC_TARGET_X86_64
> options_m[1].offset = offsetof(TCCState, nosse);
> #endif
> }
---- tcc.c ----
239c239
< #else
---
> #elif 0
242a243,244
> #else
> return 0;
254c256
<
---
> _init_options();
---- tccasm.c ----
222d221
< default:
223a223
> default:
251d250
< default:
252a252
> default:
---- tcccoff.c ----
---- tccelf.c ----
28a29
> #if 0
43a45
> #endif
171,172c173,174
< && ELFW(ST_BIND)(sym->st_info) == STB_LOCAL)
< sym->st_info = ELFW(ST_INFO)(STB_GLOBAL, ELFW(ST_TYPE)(sym->st_info));
---
> && ELF64_ST_BIND(sym->st_info) == STB_LOCAL)
> sym->st_info = ELF64_ST_INFO(STB_GLOBAL, ELF64_ST_TYPE(sym->st_info));
183c185
< int n = ELFW(R_SYM)(rel->r_info) - first_sym;
---
> int n = ELF64_R_SYM(rel->r_info) - first_sym;
185c187
< rel->r_info = ELFW(R_INFO)(tr[n], ELFW(R_TYPE)(rel->r_info));
---
> rel->r_info = ELF64_R_INFO(tr[n], ELF64_R_TYPE(rel->r_info));
375c377
< if (ELFW(ST_BIND)(sym->st_info) != STB_LOCAL) {
---
> if (ELF64_ST_BIND(sym->st_info) != STB_LOCAL) {
415c417
< if (ELFW(ST_BIND)(info) != STB_LOCAL) {
---
> if (ELF64_ST_BIND(info) != STB_LOCAL) {
497,499c499,501
< sym_bind = ELFW(ST_BIND)(info);
< sym_type = ELFW(ST_TYPE)(info);
< sym_vis = ELFW(ST_VISIBILITY)(other);
---
> sym_bind = ELF64_ST_BIND(info);
> sym_type = ELF64_ST_TYPE(info);
> sym_vis = ELF64_ST_VISIBILITY(other);
511c513
< esym_bind = ELFW(ST_BIND)(esym->st_info);
---
> esym_bind = ELF64_ST_BIND(esym->st_info);
514c516
< esym_vis = ELFW(ST_VISIBILITY)(esym->st_other);
---
> esym_vis = ELF64_ST_VISIBILITY(esym->st_other);
522c524
< esym->st_other = (esym->st_other & ~ELFW(ST_VISIBILITY)(-1))
---
> esym->st_other = (esym->st_other & ~ELF64_ST_VISIBILITY(-1))
560c562
< esym->st_info = ELFW(ST_INFO)(sym_bind, sym_type);
---
> esym->st_info = ELF64_ST_INFO(sym_bind, sym_type);
570c572
< ELFW(ST_INFO)(sym_bind, sym_type), other,
---
> ELF64_ST_INFO(sym_bind, sym_type), other,
598c600
< rel->r_info = ELFW(R_INFO)(symbol, type);
---
> rel->r_info = ELF64_R_INFO(symbol, type);
737c739
< if (ELFW(ST_BIND)(p->st_info) == STB_LOCAL) {
---
> if (ELF64_ST_BIND(p->st_info) == STB_LOCAL) {
750c752
< if (ELFW(ST_BIND)(p->st_info) != STB_LOCAL) {
---
> if (ELF64_ST_BIND(p->st_info) != STB_LOCAL) {
766,767c768,769
< sym_index = ELFW(R_SYM)(rel->r_info);
< type = ELFW(R_TYPE)(rel->r_info);
---
> sym_index = ELF64_R_SYM(rel->r_info);
> type = ELF64_R_TYPE(rel->r_info);
769c771
< rel->r_info = ELFW(R_INFO)(sym_index, type);
---
> rel->r_info = ELF64_R_INFO(sym_index, type);
810c812
< sym_bind = ELFW(ST_BIND)(sym->st_info);
---
> sym_bind = ELF64_ST_BIND(sym->st_info);
838c840
< sym_index = ELFW(R_SYM)(rel->r_info);
---
> sym_index = ELF64_R_SYM(rel->r_info);
840c842
< type = ELFW(R_TYPE)(rel->r_info);
---
> type = ELF64_R_TYPE(rel->r_info);
873,874c875,876
< sym_index = ELFW(R_SYM)(rel->r_info);
< type = ELFW(R_TYPE)(rel->r_info);
---
> sym_index = ELF64_R_SYM(rel->r_info);
> type = ELF64_R_TYPE(rel->r_info);
881c883
< rel->r_info = ELFW(R_INFO)(sym_index, R_386_RELATIVE);
---
> rel->r_info = ELF64_R_INFO(sym_index, R_386_RELATIVE);
916c918
< set_elf_sym(symtab_section, 0, 4, ELFW(ST_INFO)(STB_GLOBAL, STT_OBJECT),
---
> set_elf_sym(symtab_section, 0, 4, ELF64_ST_INFO(STB_GLOBAL, STT_OBJECT),
963c965
< if (ELFW(ST_BIND)(sym->st_info) == STB_LOCAL) {
---
> if (ELF64_ST_BIND(sym->st_info) == STB_LOCAL) {
1008c1010
< ELFW(ST_INFO)(STB_GLOBAL, STT_FUNC), 0, s1->plt->sh_num, plt_name);
---
> ELF64_ST_INFO(STB_GLOBAL, STT_FUNC), 0, s1->plt->sh_num, plt_name);
1034c1036
< type = ELFW(R_TYPE)(rel->r_info);
---
> type = ELF64_R_TYPE(rel->r_info);
1036c1038
< sym_index = ELFW(R_SYM)(rel->r_info);
---
> sym_index = ELF64_R_SYM(rel->r_info);
1068,1070c1070,1072
< && (ELFW(ST_TYPE)(esym->st_info) == STT_FUNC
< || (ELFW(ST_TYPE)(esym->st_info) == STT_NOTYPE
< && ELFW(ST_TYPE)(sym->st_info) == STT_FUNC)))
---
> && (ELF64_ST_TYPE(esym->st_info) == STT_FUNC
> || (ELF64_ST_TYPE(esym->st_info) == STT_NOTYPE
> && ELF64_ST_TYPE(sym->st_info) == STT_FUNC)))
1083,1085c1085,1087
< (ELFW(ST_VISIBILITY)(sym->st_other) != STV_DEFAULT ||
< ELFW(ST_BIND)(sym->st_info) == STB_LOCAL)) {
< rel->r_info = ELFW(R_INFO)(sym_index, R_X86_64_PC32);
---
> (ELF64_ST_VISIBILITY(sym->st_other) != STV_DEFAULT ||
> ELF64_ST_BIND(sym->st_info) == STB_LOCAL)) {
> rel->r_info = ELF64_R_INFO(sym_index, R_X86_64_PC32);
1105c1107
< rel->r_info = ELFW(R_INFO)(attr->plt_sym, type);
---
> rel->r_info = ELF64_R_INFO(attr->plt_sym, type);
1140c1142
< ELFW(ST_INFO)(STB_GLOBAL, STT_NOTYPE), 0,
---
> ELF64_ST_INFO(STB_GLOBAL, STT_NOTYPE), 0,
1144c1146
< ELFW(ST_INFO)(STB_GLOBAL, STT_NOTYPE), 0,
---
> ELF64_ST_INFO(STB_GLOBAL, STT_NOTYPE), 0,
1168c1170
< ELFW(ST_INFO)(STB_GLOBAL, STT_NOTYPE), 0,
---
> ELF64_ST_INFO(STB_GLOBAL, STT_NOTYPE), 0,
1172c1174
< ELFW(ST_INFO)(STB_GLOBAL, STT_NOTYPE), 0,
---
> ELF64_ST_INFO(STB_GLOBAL, STT_NOTYPE), 0,
1221c1223
< ELFW(ST_INFO)(STB_GLOBAL, STT_NOTYPE), 0,
---
> ELF64_ST_INFO(STB_GLOBAL, STT_NOTYPE), 0,
1225c1227
< ELFW(ST_INFO)(STB_GLOBAL, STT_NOTYPE), 0,
---
> ELF64_ST_INFO(STB_GLOBAL, STT_NOTYPE), 0,
1229c1231
< ELFW(ST_INFO)(STB_GLOBAL, STT_NOTYPE), 0,
---
> ELF64_ST_INFO(STB_GLOBAL, STT_NOTYPE), 0,
1260c1262
< ELFW(ST_INFO)(STB_GLOBAL, STT_NOTYPE), 0,
---
> ELF64_ST_INFO(STB_GLOBAL, STT_NOTYPE), 0,
1265c1267
< ELFW(ST_INFO)(STB_GLOBAL, STT_NOTYPE), 0,
---
> ELF64_ST_INFO(STB_GLOBAL, STT_NOTYPE), 0,
1314c1316
< int sym_index = ELFW(R_SYM) (rel->r_info);
---
> int sym_index = ELF64_R_SYM (rel->r_info);
1344c1346
< switch (ELFW(R_TYPE) (rel->r_info)) {
---
> switch (ELF64_R_TYPE (rel->r_info)) {
1363,1364c1365,1366
< if (ELFW(R_TYPE)(rel->r_info) == R_RELATIVE) {
< int sym_index = ELFW(R_SYM) (rel->r_info);
---
> if (ELF64_R_TYPE(rel->r_info) == R_RELATIVE) {
> int sym_index = ELF64_R_SYM (rel->r_info);
1370c1372
< rel->r_info = ELFW(R_INFO)(0, R_RELATIVE);
---
> rel->r_info = ELF64_R_INFO(0, R_RELATIVE);
1400c1402
< type = ELFW(ST_TYPE)(esym->st_info);
---
> type = ELF64_ST_TYPE(esym->st_info);
1411c1413
< ELFW(ST_INFO)(STB_GLOBAL,STT_FUNC), 0, 0,
---
> ELF64_ST_INFO(STB_GLOBAL,STT_FUNC), 0, 0,
1428c1430
< if (ELFW(ST_BIND)(esym->st_info) == STB_WEAK) {
---
> if (ELF64_ST_BIND(esym->st_info) == STB_WEAK) {
1431c1433
< && (ELFW(ST_BIND)(dynsym->st_info) == STB_GLOBAL)) {
---
> && (ELF64_ST_BIND(dynsym->st_info) == STB_GLOBAL)) {
1450c1452
< if (ELFW(ST_BIND)(sym->st_info) == STB_WEAK ||
---
> if (ELF64_ST_BIND(sym->st_info) == STB_WEAK ||
1456c1458
< } else if (s1->rdynamic && ELFW(ST_BIND)(sym->st_info) != STB_LOCAL) {
---
> } else if (s1->rdynamic && ELF64_ST_BIND(sym->st_info) != STB_LOCAL) {
1481c1483
< && ELFW(ST_BIND)(sym->st_info) != STB_LOCAL) {
---
> && ELF64_ST_BIND(sym->st_info) != STB_LOCAL) {
1486c1488
< if (ELFW(ST_BIND)(esym->st_info) != STB_WEAK)
---
> if (ELF64_ST_BIND(esym->st_info) != STB_WEAK)
1503c1505
< if (ELFW(ST_BIND)(sym->st_info) != STB_LOCAL) {
---
> if (ELF64_ST_BIND(sym->st_info) != STB_LOCAL) {
1909,1913d1910
< default:
< case TCC_OUTPUT_EXE:
< ehdr.e_type = ET_EXEC;
< ehdr.e_entry = get_elf_sym_addr(s1, "_start", 1);
< break;
1920a1918,1922
> case TCC_OUTPUT_EXE:
> default:
> ehdr.e_type = ET_EXEC;
> ehdr.e_entry = get_elf_sym_addr(s1, "_start", 1);
> break;
2481c2483
< if (ELFW(ST_BIND)(sym->st_info) != STB_LOCAL) {
---
> if (ELF64_ST_BIND(sym->st_info) != STB_LOCAL) {
2520,2521c2522,2523
< type = ELFW(R_TYPE)(rel->r_info);
< sym_index = ELFW(R_SYM)(rel->r_info);
---
> type = ELF64_R_TYPE(rel->r_info);
> sym_index = ELF64_R_SYM(rel->r_info);
2537c2539
< rel->r_info = ELFW(R_INFO)(sym_index, type);
---
> rel->r_info = ELF64_R_INFO(sym_index, type);
2766c2768
< sym_bind = ELFW(ST_BIND)(sym->st_info);
---
> sym_bind = ELF64_ST_BIND(sym->st_info);
---- tccgen.c ----
24a25,26
> #define NODATA_WANTED (nocode_wanted > 0) /* no static data output wanted either */
> #define STATIC_DATA_WANTED (nocode_wanted & 0xC0000000) /* only static data output */
31c33,39
< ST_DATA int rsym, anon_sym, ind, loc;
---
> static int local_scope;
> static int in_sizeof;
> static int section_sym;
>
> ST_DATA int vlas_in_scope; /* number of VLAs that are currently in scope */
> ST_DATA int vla_sp_root_loc; /* vla_sp_loc for SP before any VLAs were pushed */
> ST_DATA int vla_sp_loc; /* Pointer to variable holding location to store stack pointer on the stack when modifying stack pointer */
32a41,42
> #if 0
> ST_DATA int rsym, anon_sym, ind, loc;
42,48d51
< static int local_scope;
< static int in_sizeof;
< static int section_sym;
<
< ST_DATA int vlas_in_scope; /* number of VLAs that are currently in scope */
< ST_DATA int vla_sp_root_loc; /* vla_sp_loc for SP before any VLAs were pushed */
< ST_DATA int vla_sp_loc; /* Pointer to variable holding location to store stack pointer on the stack when modifying stack pointer */
54,55d56
< #define NODATA_WANTED (nocode_wanted > 0) /* no static data output wanted either */
< #define STATIC_DATA_WANTED (nocode_wanted & 0xC0000000) /* only static data output */
63,64c64,66
<
< ST_DATA CType char_pointer_type, func_old_type, int_type, size_type, ptrdiff_type;
---
> ST_DATA CType char_pointer_type, func_old_type, int_type, size_type;
> #endif
> ST_DATA CType ptrdiff_type;
161c163
< ELFW(ST_INFO)(STB_LOCAL, STT_SECTION), 0,
---
> ELF64_ST_INFO(STB_LOCAL, STT_SECTION), 0,
179c181
< ELFW(ST_INFO)(STB_LOCAL, STT_FILE), 0,
---
> ELF64_ST_INFO(STB_LOCAL, STT_FILE), 0,
302c304
< esym->st_other = (esym->st_other & ~ELFW(ST_VISIBILITY)(-1))
---
> esym->st_other = (esym->st_other & ~ELF64_ST_VISIBILITY(-1))
311c313
< old_sym_bind = ELFW(ST_BIND)(esym->st_info);
---
> old_sym_bind = ELF64_ST_BIND(esym->st_info);
313c315
< esym->st_info = ELFW(ST_INFO)(sym_bind, ELFW(ST_TYPE)(esym->st_info));
---
> esym->st_info = ELF64_ST_INFO(sym_bind, ELF64_ST_TYPE(esym->st_info));
410c412
< info = ELFW(ST_INFO)(sym_bind, sym_type);
---
> info = ELF64_ST_INFO(sym_bind, sym_type);
1904d1905
< default: l1 = gen_opic_sdiv(l1, l2); break;
1907a1909
> default: l1 = gen_opic_sdiv(l1, l2); break;
2458a2461,2470
> static long double negate_ld(long double d) {
> #if LDBL_MANT_DIG == 64
> register unsigned long long *p = (unsigned long long *)&d;
> p[1] ^= 1ul<<15;
> return *(long double *)p;
> #else
> return -d;
> #endif
> }
>
2500c2512
< vtop->c.ld = -(long double)-vtop->c.i;
---
> vtop->c.ld = negate_ld((long double)-vtop->c.i);
2505c2517
< vtop->c.ld = -(long double)-(uint32_t)vtop->c.i;
---
> vtop->c.ld = negate_ld((long double)-(uint32_t)vtop->c.i);
6517,6518c6529,6530
< ELFW(R_TYPE)(rel->r_info),
< ELFW(R_SYM)(rel->r_info),
---
> ELF64_R_TYPE(rel->r_info),
> ELF64_R_SYM(rel->r_info),
---- tccpe.c ----
---- tccpp.c ----
25a26
> #if 0
39a41
> #endif
62c64
< #define DEF(id, str) str "\0"
---
> #define DEF(id, str) str "\n"
1506c1508
< if (varg < TOK_IDENT)
---
> if (varg < TOK_IDENT) {
1508a1511
> }
1554c1557
< if (3 == spc)
---
> if (3 == spc) {
1556a1560
> }
3671c3675
< if (c == '\0')
---
> if (c == '\n')
---- tccrun.c ----
---- tcctools.c ----
---- x86_64-gen.c ----
111,141d110
< ST_DATA const int reg_classes[NB_REGS] = {
< /* eax */ RC_INT | RC_RAX,
< /* ecx */ RC_INT | RC_RCX,
< /* edx */ RC_INT | RC_RDX,
< 0,
< 0,
< 0,
< 0,
< 0,
< RC_R8,
< RC_R9,
< RC_R10,
< RC_R11,
< 0,
< 0,
< 0,
< 0,
< /* xmm0 */ RC_FLOAT | RC_XMM0,
< /* xmm1 */ RC_FLOAT | RC_XMM1,
< /* xmm2 */ RC_FLOAT | RC_XMM2,
< /* xmm3 */ RC_FLOAT | RC_XMM3,
< /* xmm4 */ RC_FLOAT | RC_XMM4,
< /* xmm5 */ RC_FLOAT | RC_XMM5,
< /* xmm6 an xmm7 are included so gv() can be used on them,
< but they are not tagged with RC_FLOAT because they are
< callee saved on Windows */
< RC_XMM6,
< RC_XMM7,
< /* st0 */ RC_ST0
< };
<
633c602
< greloca(cur_text_section, vtop->sym, ind + 1, R_X86_64_PLT32, (int)(vtop->c.i-4));
---
> greloca(cur_text_section, vtop->sym, ind + 1, R_X86_64_PC32, (int)(vtop->c.i-4)); // tcc's PLT code doesn't seem to work with static builds
1194a1164,1166
> enum __va_arg_type {
> __va_gen_reg, __va_float_reg, __va_stack
> };
1198,1200d1169
< enum __va_arg_type {
< __va_gen_reg, __va_float_reg, __va_stack
< };
1204d1172
< default: return __va_stack;
1206a1175
> default: return __va_stack;
1244c1213
< char _onstack[nb_args], *onstack = _onstack;
---
> char _onstack[/*nb_args*/1000/*fucking vlas*/], *onstack = _onstack;
1461,1465d1429
< default:
< stack_arg:
< seen_stack_size = ((seen_stack_size + align - 1) & -align) + size;
< break;
<
1476a1441,1445
> default:
> stack_arg:
> seen_stack_size = ((seen_stack_size + align - 1) & -align) + size;
> break;
>
1940,1943d1908
< default:
< case '+':
< a = 0;
< break;
1956a1922,1925
> case '+':
> default:
> a = 0;
> break;
2016,2019d1984
< default:
< case '+':
< a = 0;
< break;
2027a1993,1996
> break;
> case '+':
> default:
> a = 0;
---- x86_64-link.c ----
177c177
< sym_index = ELFW(R_SYM)(rel->r_info);
---
> sym_index = ELF64_R_SYM(rel->r_info);
185c185
< qrel->r_info = ELFW(R_INFO)(esym_index, R_X86_64_64);
---
> qrel->r_info = ELF64_R_INFO(esym_index, R_X86_64_64);
190c190
< qrel->r_info = ELFW(R_INFO)(0, R_X86_64_RELATIVE);
---
> qrel->r_info = ELF64_R_INFO(0, R_X86_64_RELATIVE);
202c202
< qrel->r_info = ELFW(R_INFO)(0, R_X86_64_RELATIVE);
---
> qrel->r_info = ELF64_R_INFO(0, R_X86_64_RELATIVE);
216c216
< qrel->r_info = ELFW(R_INFO)(esym_index, R_X86_64_PC32);
---
> qrel->r_info = ELF64_R_INFO(esym_index, R_X86_64_PC32);
249c249
< qrel->r_info = ELFW(R_INFO)(esym_index, R_X86_64_PC64);
---
> qrel->r_info = ELF64_R_INFO(esym_index, R_X86_64_PC64);
---- lib/armeabi.c ----
---- lib/armflush.c ----
---- lib/bcheck.c ----
---- lib/lib-arm64.c ----
---- lib/libtcc1.c ----
615a616,622
>
> static long double negate_ld(long double d) {
> register unsigned long long *p = (unsigned long long *)&d;
> p[1] ^= 1ul<<15;
> return *(long double *)p;
> }
>
619c626
< ret = __fixunsxfdi((s = a1 >= 0) ? a1 : -a1);
---
> ret = __fixunsxfdi((s = a1 >= 0) ? a1 : negate_ld(a1));
---- lib/va_list.c ----

View file

@ -1,10 +1,7 @@
#define TCC_VERSION "0.9.27"
#define CONFIG_TCC_STATIC 1
//#define CONFIG_TCC_ELFINTERP "/XXX"
//#define CONFIG_TCC_CRT_PREFIX "/XXX"
//#define CONFIG_SYSROOT "/XXX"
#define inline
#define TCC_TARGET_X86_64 1
#define ONE_SOURCE 1
#define CONFIG_LDDIR "lib/x86_64-linux-gnu"
#define CONFIG_TCCDIR "/usr/local/lib/tcc-bootstrap"
#define inline

View file

@ -5,6 +5,8 @@ all: markdown README.html
$(MAKE) -C 03
$(MAKE) -C 04
$(MAKE) -C 04a
# don't compile all of 05 because it takes a while
$(MAKE) -C 05 README.html
clean:
$(MAKE) -C 00 clean
$(MAKE) -C 01 clean

View file

@ -27,7 +27,7 @@ command codes.
- [stage 03](03/README.md) - a language with longer labels, better error messages, and less register manipulation
- [stage 04](04/README.md) - a language with nice functions and local variables
- [stage 04a](04a/README.md) - (interlude) a simple preprocessor
- more coming soon (hopefully)
- [stage 05](05/README.md) - a C compiler capable of compiling TCC
## prerequisite knowledge
@ -59,21 +59,21 @@ If you're unfamiliar with x86-64 assembly, you should check out the instruction
Bootstrapping a compiler is not an easy task, so we're trying to make it as easy
as possible. We don't even necessarily need a standard-compliant C compiler, we
only need enough to compile someone else's C compiler, specifically we'll be
only need enough to compile someone else's C compiler. Specifically, we'll be
using [TCC](https://bellard.org/tcc/) since it's written (mostly) in standard C89.
- efficiency is not a concern
We will create big and slow executables, and that's okay. It doesn't really
matter if compiling TCC takes 8 as opposed to 0.01 seconds; once we compile TCC
with itself, we'll get the same executable either way.
matter if compiling TCC takes 30 as opposed to 0.01 seconds; once the process
is finished, we'll get the same executable either way.
## reflections on trusting trust
In 1984, Ken Thompson wrote the well-known article
[Reflections on Trusting Trust](http://users.ece.cmu.edu/~ganger/712.fall02/papers/p761-thompson.pdf).
This is one of the inspirations for this project. To summarize
the article: it is possible to create a malicious C compiler which will
This is one of the inspirations for this project. A brief summary is:
it's possible to create a malicious C compiler which will
replicate its own malicious functionalities (e.g. detecting password-checking
routines to make them also accept another password the attacker knows) when used
to compile other C compilers. For all we know, such a compiler was used to
@ -224,10 +224,10 @@ Arguments are passed in
The return value is placed in rax.
```
More will be added in the future as needed.
## license
Note that this does not apply to TCC's source code (`05/tcc-0.9.27`).
```
This project is in the public domain. Any copyright protections from any law
are forfeited by the author(s). No warranty is provided, and the author(s)

View file

@ -88,5 +88,14 @@ if [ "$(sed '/^#/d;/^$/d' out04a)" != 'Hello, world!' ]; then
fi
cd ..
echo 'Processing stage 05 (this will take some time)...'
cd 05
rm -f test.out out04 in04 *.o tcc-0.9.27/tcc0
make -s test.out > /dev/null
if [ "$(./test.out)" != 'Hello, world!' ]; then
echo_red 'Stage 05 failed.'
exit 1
fi
cd ..
echo_green 'all stages completed successfully!'