finished preprocessor

This commit is contained in:
pommicket 2022-01-07 14:31:52 -05:00
parent fbe3f4e701
commit e52793324a
12 changed files with 428 additions and 25 deletions

View file

@ -1,4 +1,4 @@
# stage 00 # [bootstrap](../README.md) stage 00
This directory contains the file `hexcompile`, a handwritten executable. It This directory contains the file `hexcompile`, a handwritten executable. It
takes input file `in00` containing space/newline/(any character)-separated takes input file `in00` containing space/newline/(any character)-separated

View file

@ -1,4 +1,4 @@
# stage 01 # [bootstrap](../README.md) stage 01
The code for the compiler for this stage is in the file `in00`. And yes, that's The code for the compiler for this stage is in the file `in00`. And yes, that's
an input to our [previous program](../00/README.html), `hexcompile`, from stage 00! To compile it, an input to our [previous program](../00/README.html), `hexcompile`, from stage 00! To compile it,

View file

@ -1,4 +1,4 @@
# stage 02 # [bootstrap](../README.md) stage 02
The compiler for this stage is in the file `in01`, an input for our [previous compiler](../01/README.md). The compiler for this stage is in the file `in01`, an input for our [previous compiler](../01/README.md).
So if you run `../01/out00`, you'll get the file `out01`, which is So if you run `../01/out00`, you'll get the file `out01`, which is
@ -178,7 +178,7 @@ the command `~~` (the end of the command table overlaps with the start of the la
This command is just 255 bytes of zeros. If you defined a label whose position in the label This command is just 255 bytes of zeros. If you defined a label whose position in the label
table overlaps with these zeros, you'd screw up the command. But fortunately, this will only happen table overlaps with these zeros, you'd screw up the command. But fortunately, this will only happen
if you include `\r` or a non-printing character in your label names. if you include `\r` or a non-printing character in your label names.
This is so that you can have big buffers to put data in (like our label table from this compiler). The `~~` command makes it easier to create big buffers to put data in (like our label table from this compiler).
## limitations ## limitations

View file

@ -1,4 +1,4 @@
# stage 03 # [bootstrap](../README.md) stage 03
The code for this compiler (in the file `in02`, an input for our [stage 02 compiler](../02/README.md)) The code for this compiler (in the file `in02`, an input for our [stage 02 compiler](../02/README.md))
is 2700 lines—quite a bit longer than the previous ones. is 2700 lines—quite a bit longer than the previous ones.
To compile it, run `../02/out01` from this directory. To compile it, run `../02/out01` from this directory.

View file

@ -1,4 +1,4 @@
# stage 04 # [bootstrap](../README.md) stage 04
As usual, the source for this compiler is `in03`, an input to the [previous compiler](../03/README.md). As usual, the source for this compiler is `in03`, an input to the [previous compiler](../03/README.md).
`in04` contains a hello world program written in the stage 4 language. `in04` contains a hello world program written in the stage 4 language.

View file

@ -1,6 +1,8 @@
all: out04 all: out04 out04a README.html
out04: in04 ../04/out03 out04: in04 ../04/out03
../04/out03 ../04/out03
out04a: in04a out04
./out04
%.html: %.md ../markdown %.html: %.md ../markdown
../markdown $< ../markdown $<
clean: clean:

View file

@ -1,23 +1,74 @@
# stage 04a # [bootstrap](../README.md) stage 04a
Rather than a compiler, this stage only consists of a simple [preprocessor](https://en.wikipedia.org/wiki/Preprocessor). Rather than a compiler, this stage only consists of a simple [preprocessor](https://en.wikipedia.org/wiki/Preprocessor).
In the future, we'll run our code through this program, then run its output In the future, we'll run our code through this program, then run its output
through a compiler. through a compiler.
It take lines like: It takes lines like:
``` ```
#define THREE d3 #define NUMBER 349
```
and then replaces `THREE` anywhere in the rest of the code with `d3`.
I've provided `in04a` as a little example.
Unlike previous programs, you can control the input and output file names
without recompiling it. So to compile the example program:
```
make out03
./out03 in04a out04a
``` ```
Although it seems simple, this program will be very useful: and then replaces `NUMBER` anywhere in the rest of the code with `349`.
it'll let us define constants and it'll work in any language. Also, it lets you "include" files in other files. The line
There really isn't much else to say about this program. With that,
we can move on to [the next stage](../04b/README.md) which should be more exciting. ```
#include other_file.txt
```
will put the contents of `other_file.txt` right there.
But wait! If we mess around with source code for our 04 compiler
with a preprocessor, we could screw up the line numbers
in error messages! This is where the `#line` directive from the 04 language comes in.
Let's take a look at the source files `in04a`:
```
#define H Hello,
#include test_inc
H W!
```
and `test_inc`:
```
#define W world
```
When `in04a` gets preprocessed, it turns into:
```
#line 1 in04a
#line 1 test_inc
#line 3 in04a
Hello, world!
```
As we can see, the preprocessor sets up a `#line` directive to put `Hello, world!`
on the line where `H W!` appeared in the source file.
Although this program is quite simple, it will be very useful:
we can now define constants and split up our programs across multiple files.
One intersting note about the code itself: rather than create a large
global variable for the `defines` list, I decided to make a little `malloc`
function. This uses the `mmap` syscall to allocate memory.
The benefit of this is that we can allocate 4MB of memory without
adding 4MB to the size of the executable. Also, it lets us free the memory
(using `munmap`),
which isn't particularly useful here, but might be in the future.
Note that replacements will not be checked for replacements, i.e. the code:
```
#define A 10
#define B A
B
```
Will be preprocessed to `A`, not `10`.

339
04a/in04
View file

@ -11,6 +11,9 @@ global output_fd
goto main goto main
global defines
global defines_end
function main function main
argument argv2 argument argv2
argument argv1 argument argv1
@ -19,6 +22,9 @@ function main
local input_filename local input_filename
local output_filename local output_filename
defines = malloc(4000000)
defines_end = defines
if argc < 3 goto default_filenames if argc < 3 goto default_filenames
input_filename = argv1 input_filename = argv1
output_filename = argv2 output_filename = argv2
@ -32,6 +38,9 @@ function main
if output_fd >= 0 goto output_file_good if output_fd >= 0 goto output_file_good
file_error(output_filename) file_error(output_filename)
:output_file_good :output_file_good
preprocess(input_filename, output_fd)
close(output_fd)
free(defines)
exit(0) exit(0)
:str_default_input_filename :str_default_input_filename
@ -42,6 +51,203 @@ function main
string out04a string out04a
byte 0 byte 0
function preprocess
argument input_filename
argument output_fd
local input_fd
global 2048 line_buf
local line
local b
local p
local c
local line_number
line_number = 0
line = &line_buf
; first, open the input file
input_fd = syscall(2, input_filename, 0)
if input_fd >= 0 goto input_file_good
file_error(input_filename)
:input_file_good
; output a line directive
fputs(output_fd, .str_line1)
fputs(output_fd, input_filename)
fputc(output_fd, 10)
:preprocess_loop
line_number += 1
b = fgets(input_fd, line, 2000)
if b == 0 goto preprocess_eof
b = str_startswith(line, .str_define)
if b != 0 goto handle_define
b = str_startswith(line, .str_include)
if b != 0 goto handle_include
; normal line (not #define or #include)
p = line
:normal_line_loop
c = *1p
if c == 0 goto normal_line_loop_end
; optimization: don't look this up if it doesn't start with an uppercase letter
b = isupper(c)
if b == 0 goto no_replacement
b = look_up_define(p)
if b == 0 goto no_replacement
; wow! a replacement!
fputs(output_fd, b)
; advance p past this identifier
:advance_loop
c = *1p
b = is_ident(c)
if b == 0 goto normal_line_loop
p += 1
goto advance_loop
:no_replacement
fputc(output_fd, c)
p += 1
goto normal_line_loop
:normal_line_loop_end
fputc(output_fd, 10)
goto preprocess_loop
:handle_define
local def
def = line + 8 ; 8 = length of "#define "
; make sure define name only consists of identifier characters
p = def
c = *1p
b = isupper(c)
if b == 0 goto bad_define
:define_check_loop
c = *1p
if c == 32 goto define_check_loop_end
b = is_ident(c)
if b == 0 goto bad_define
p += 1
goto define_check_loop
:define_check_loop_end
b = look_up_define(def)
if b != 0 goto redefinition
defines_end = strcpy(defines_end, def)
defines_end += 1
fputc(output_fd, 10) ; don't screw up line numbers
goto preprocess_loop
:bad_define
fputs(2, .str_bad_define)
fputs(2, line)
fputc(2, 10)
exit(1)
:redefinition
fputs(2, .str_redefinition)
fputs(2, line)
fputc(2, 10)
exit(1)
:handle_include
local included_filename
local n
included_filename = line + 9 ; 9 = length of "#include "
preprocess(included_filename, output_fd)
; reset filename and line number
fputs(output_fd, .str_line)
n = line_number + 1
fputn(output_fd, n)
fputc(output_fd, 32)
fputs(output_fd, input_filename)
fputc(output_fd, 10)
goto preprocess_loop
:preprocess_eof
close(input_fd)
return
:str_redefinition
string Preprocessor redefinition:
byte 32
byte 0
:str_bad_define
string Bad preprocessor definition:
byte 32
byte 0
:str_define
string #define
byte 32
byte 0
:str_include
string #include
byte 32
byte 0
:str_line
string #line
byte 32
byte 0
:str_line1
string #line
byte 32
string 1
byte 32
byte 0
; returns a pointer to the thing str should be replaced with,
; or 0 if there is no definition for str.
function look_up_define
argument str
local lookup
local p
local c
lookup = defines
:lookup_loop
c = *1lookup
if c == 0 goto lookup_not_found
c = ident_eq(str, lookup)
if c == 1 goto lookup_found
lookup = memchr(lookup, 0)
lookup += 1
goto lookup_loop
:lookup_not_found
return 0
:lookup_found
p = memchr(lookup, 32)
return p + 1 ; the character after the space following the name is the replacement
; returns 1 if the identifiers s1 and s2 are equal; 0 otherwise
function ident_eq
argument s1
argument s2
local p1
local p2
local c1
local c2
local b1
local b2
p1 = s1
p2 = s2
:ident_eq_loop
c1 = *1p1
c2 = *1p2
b1 = is_ident(c1)
b2 = is_ident(c2)
if b1 != b2 goto return_0
if b1 == 0 goto return_1
if c1 != c2 goto return_0
p1 += 1
p2 += 1
goto ident_eq_loop
function is_ident
argument c
if c < '0 goto return_0
if c <= '9 goto return_1
if c < 'A goto return_0
if c <= 'Z goto return_1
if c == '_ goto return_1
goto return_0
function file_error function file_error
argument name argument name
fputs(2, .str_file_error) fputs(2, .str_file_error)
@ -54,6 +260,33 @@ function file_error
byte 32 byte 32
byte 0 byte 0
function malloc
argument size
local total_size
local memory
total_size = size + 8
memory = syscall(9, 0, total_size, 3, 0x22, -1, 0)
if memory ] 0xffffffffffff0000 goto malloc_failed
*8memory = total_size
return memory + 8
:malloc_failed
fputs(2, .str_out_of_memory)
exit(1)
:str_out_of_memory
string Out of memory.
byte 10
byte 0
function free
argument memory
local psize
local size
psize = memory - 8
size = *8psize
syscall(11, psize, size)
return
; returns a pointer to a null-terminated string containing the number given ; returns a pointer to a null-terminated string containing the number given
function itos function itos
@ -94,6 +327,19 @@ function stoi
:stoi_loop_end :stoi_loop_end
return n return n
function memchr
argument mem
argument c
local p
local a
p = mem
:memchr_loop
a = *1p
if a == c goto memchr_loop_end
p += 1
goto memchr_loop
:memchr_loop_end
return p
function strlen function strlen
argument s argument s
@ -108,6 +354,42 @@ function strlen
:strlen_loop_end :strlen_loop_end
return p - s return p - s
function strcpy
argument dest
argument src
local p
local q
local c
p = dest
q = src
:strcpy_loop
c = *1q
*1p = c
if c == 0 goto strcpy_loop_end
p += 1
q += 1
goto strcpy_loop
:strcpy_loop_end
return p
function str_startswith
argument s
argument prefix
local p
local q
local c1
local c2
p = s
q = prefix
:str_startswith_loop
c1 = *1p
c2 = *1q
if c2 == 0 goto return_1
if c1 != c2 goto return_0
p += 1
q += 1
goto str_startswith_loop
function fputs function fputs
argument fd argument fd
argument s argument s
@ -142,10 +424,67 @@ function putc
fputc(1, c) fputc(1, c)
return return
; returns 0 at end of file
function fgetc
argument fd
local c
local p
c = 0
p = &c
syscall(0, fd, p, 1)
return c
; read a line from fd as a null-terminated string
; returns 0 at end of file, 1 otherwise
function fgets
argument fd
argument buf
argument size
local p
local end
local c
p = buf
end = buf + size
:fgets_loop
c = fgetc(fd)
if c == 0 goto fgets_eof
if c == 10 goto fgets_eol
*1p = c
p += 1
if p == end goto fgets_eob
goto fgets_loop
:fgets_eol ; end of line
*1p = 0
return 1
:fgets_eof ; end of file
*1p = 0
return 0
:fgets_eob ; end of buffer
p -= 1
*1p = 0
return 1
function close
argument fd
syscall(3, fd)
return
function isupper
argument c
if c < 'A goto return_0
if c <= 'Z goto return_1
goto return_0
function exit function exit
argument status_code argument status_code
syscall(0x3c, status_code) syscall(0x3c, status_code)
:return_0
return 0
:return_1
return 1
function syscall function syscall
; I've done some testing, and this should be okay even if ; I've done some testing, and this should be okay even if

View file

@ -1,3 +1,3 @@
#define H Hello, #define H Hello,
#define W world #include test_inc
H W! H W!

1
04a/test_inc Normal file
View file

@ -0,0 +1 @@
#define W world

View file

@ -27,11 +27,11 @@ command codes.
- [stage 03](03/README.md) - a language with longer labels, better error messages, and less register manipulation - [stage 03](03/README.md) - a language with longer labels, better error messages, and less register manipulation
- more coming soon (hopefully) - more coming soon (hopefully)
- [stage 04](04/README.md) - a language with nice functions and local variables - [stage 04](04/README.md) - a language with nice functions and local variables
- [stage 04a](04a/README.md) - (interlude) a very simple preprocessor - [stage 04a](04a/README.md) - (interlude) a simple preprocessor
## prerequisite knowledge ## prerequisite knowledge
In this series, I want to explain *everything* that's going on. I'm going to In this series, I want to *everything* that's going on to be understandable. I'm going to
need to assume some passing knowledge, so here's a quick overview of what you'll need to assume some passing knowledge, so here's a quick overview of what you'll
want to know before starting. want to know before starting.
You don't need to understand everything about each of these, just get You don't need to understand everything about each of these, just get

View file

@ -78,5 +78,15 @@ if [ "$(./out04)" != 'Hello, world!' ]; then
fi fi
cd .. cd ..
echo 'Processing stage 04a...'
cd 04a
rm -f out*
make -s out04a
if [ "$(sed '/^#/d;/^$/d' out04a)" != 'Hello, world!' ]; then
echo_red 'Stage 04a failed.'
exit 1
fi
cd ..
echo_green 'all stages completed successfully!' echo_green 'all stages completed successfully!'