rename 04b => 04, better 04 README

This commit is contained in:
pommicket 2022-01-07 11:07:06 -05:00
parent 4cd2b7047c
commit 519069a89d
8 changed files with 76 additions and 41 deletions

View file

@ -165,4 +165,4 @@ you need to make sure you store away any information you'll need after the funct
And the language definitely won't be as nice to use as something with real variables. But overall, And the language definitely won't be as nice to use as something with real variables. But overall,
I'm very happy with this compiler, especially considering it's written in a language with 2-letter label I'm very happy with this compiler, especially considering it's written in a language with 2-letter label
names. names.
With that, let's move on to the [next stage](../04a/README.md). With that, let's move on to the [next stage](../04/README.md).

View file

@ -1,9 +1,9 @@
all: out03 guessing_game.out out04b README.html all: out03 guessing_game.out out04 README.html
out03: in03 ../03/out02 out03: in03 ../03/out02
../03/out02 ../03/out02
%.html: %.md ../markdown %.html: %.md ../markdown
../markdown $< ../markdown $<
out04b: in04b out03 out04: in04 out03
./out03 ./out03
%.out: % out03 %.out: % out03
./out03 $< $@ ./out03 $< $@

View file

@ -1,38 +1,60 @@
# stage 04 # stage 04
As usual, the source for this compiler is `in03`, an input to the [previous compiler](../03/README.md). As usual, the source for this compiler is `in03`, an input to the [previous compiler](../03/README.md).
`in04b` contains a hello world program written in the stage 4 language. `in04` contains a hello world program written in the stage 4 language.
Here is the core of the program: Here is the core of the program:
``` ```main()
main()
function main function main
puts(.str_hello_world) puts(.str_hello_world)
putc(10) ; newline putc(10) ; newline
syscall(0x3c, 0) syscall(0x3c, 0)
```
As you can see, we can now pass arguments to functions. And let's take a look at `putc`: :str_hello_world
string Hello, world!
byte 0
function strlen
argument s
local c
local p
p = s
:strlen_loop
c = *1p
if c == 0 goto strlen_loop_end
p += 1
goto strlen_loop
:strlen_loop_end
return p - s
```
function putc function putc
argument c argument c
local p local p
p = &c p = &c
syscall(1, 1, p, 1) syscall(1, 1, p, 1)
return return
function puts
argument s
local len
len = strlen(s)
syscall(1, 1, s, len)
return
``` ```
It's so simple compared to previous languages! Rather than mess around with registers, we can now It's so simple compared to previous languages!
declare local (and global) variables, and use them directly. These variables will be placed on the Importantly, functions now have arguments and return values.
Rather than mess around with registers, we can now
declare local (and global) variables, and use them directly.
These variables will be placed on the
stack. Since arguments are also placed on the stack, stack. Since arguments are also placed on the stack,
by implementing local variables we get arguments for free. There is no difference by implementing local variables we get arguments for free. There is no difference
between the `local` and `argument` keywords in this language other than spelling. between the `local` and `argument` keywords in this language other than spelling.
In fact, the number of agruments to a function call is not checked against In fact, the number of agruments to a function call is not checked against
how many arguments the function has. This does make it easy to screw things up by calling a function how many arguments the function has. This does make it easy to screw things up by calling a function
with the wrong number of arguments, but it also means that we can provide a variable number of arguments with the wrong number of arguments, but it also means that we can provide a variable number of arguments
to the `syscall` function. Speaking of which, if you look at the bottom of `in04b`, you'll see: to the `syscall` function. Speaking of which, if you look at the bottom of `in04`, you'll see:
``` ```
function syscall function syscall
@ -53,6 +75,7 @@ Instead, `syscall` is a function written manually in machine language.
We can take a look at its decompilation to make things clearer: We can take a look at its decompilation to make things clearer:
``` ```
(...function prologue...)
mov rax,[rbp-0x10] mov rax,[rbp-0x10]
mov rdi,rax mov rdi,rax
mov rax,[rbp-0x18] mov rax,[rbp-0x18]
@ -67,6 +90,7 @@ mov rax,[rbp-0x38]
mov r9,rax mov r9,rax
mov rax,[rbp-0x8] mov rax,[rbp-0x8]
syscall syscall
(...function epilogue...)
``` ```
This just sets `rax`, `rdi`, `rsi`, etc. to the arguments the function was called with, This just sets `rax`, `rdi`, `rsi`, etc. to the arguments the function was called with,
@ -133,22 +157,29 @@ Note that setting `rsp` very specifically rather than just doing `sub rsp, 8` is
if we skip over some code with a local variable declaration, or execute a local declaration twice, if we skip over some code with a local variable declaration, or execute a local declaration twice,
we want `rsp` to be in the right place. we want `rsp` to be in the right place.
The first three and last three instructions above are called the function *prologue* and *epilogue*. The first three and last three instructions above are called the function *prologue* and *epilogue*.
They are all the same for all functions; a prologue is generated at the start of every function, They are the same for all functions; a prologue is generated at the start of every function,
and an epilogue is generated for every return statement. and an epilogue is generated for every return statement.
The return value is placed in `rax`. The return value is placed in `rax`.
## global variables ## global variables
Global variables are much simpler than local ones. The variable `:static_memory_end` in the compiler Global variables are much simpler than local ones. The variable `:static_memory_end` in the compiler
keeps track of where to put the next global variable in memory. It is initialized at address `0x440000`, keeps track of where to put the next global variable in memory. It is initialized at address `0x500000`,
which gives us 256KB for code (and strings). When a global variable is added, `:static_memory_end` is increased which gives us 1MB for code (and strings). When a global variable is added, `:static_memory_end` is increased
by its size. by its size.
## misc improvements
- Errors now give you the line number in decimal instead of hexadecimal.
- You get an error if you declare a label (or a variable) twice.
- Conditional jumping is much nicer: e.g. `if x == 3 goto some_label`
- Comments can now appear on lines with code.
- You don't need a `d` prefix for decimal numbers.
- You can control the input and output filenames with command-line arguments (by default, `in04` and `out04` are used).
## language description ## language description
Comments begin with `;` and may be put at the end of lines Comments begin with `;`.
with or without code.
Blank lines are ignored.
To make the compiler simpler, this language doesn't support fancy To make the compiler simpler, this language doesn't support fancy
expressions like `2 * (3 + 5) / 6`. There is a limited set of possible expressions like `2 * (3 + 5) / 6`. There is a limited set of possible
@ -176,7 +207,7 @@ conditionally jump to the specified label. `{operator}` should be one of
- `{lvalue} |= {rvalue}` - `{lvalue} |= {rvalue}`
- `{lvalue} ^= {rvalue}` - `{lvalue} ^= {rvalue}`
- `{lvalue} <= {rvalue}` - left shift `lvalue` by `rvalue` - `{lvalue} <= {rvalue}` - left shift `lvalue` by `rvalue`
- `{lvalue} >= {rvalue}` - right shift `lvalue` by `rvalue` - `{lvalue} >= {rvalue}` - right shift `lvalue` by `rvalue` (unsigned)
- `{function}({term}, {term}, ...)` - function call, ignoring the return value - `{function}({term}, {term}, ...)` - function call, ignoring the return value
- `return {rvalue}` - `return {rvalue}`
- `string {str}` - places a literal string in the code - `string {str}` - places a literal string in the code
@ -185,7 +216,7 @@ conditionally jump to the specified label. `{operator}` should be one of
Now let's get down into the weeds: Now let's get down into the weeds:
A a *number* is one of: A a *number* is one of:
- `{decimal number}` - e.g. `108` (note: there's no `d` prefix anymore) - `{decimal number}` - e.g. `108`
- `0x{hexadecimal number}` - e.g. `0x2f` for 47 - `0x{hexadecimal number}` - e.g. `0x2f` for 47
- `'{character}` - e.g. `'a` for 97 (the character code for `a`) - `'{character}` - e.g. `'a` for 97 (the character code for `a`)
@ -194,7 +225,7 @@ A *term* is one of:
- `.{label name}` - the address of a label - `.{label name}` - the address of a label
- `{number}` - `{number}`
An *lvalue* is the left-hand side of an assignment expression, An *l-value* is the left-hand side of an assignment expression,
and it is one of: and it is one of:
- `{variable}` - `{variable}`
- `*1{variable}` - dereference 1 byte - `*1{variable}` - dereference 1 byte
@ -202,8 +233,8 @@ and it is one of:
- `*4{variable}` - dereference 4 bytes - `*4{variable}` - dereference 4 bytes
- `*8{variable}` - dereference 8 bytes - `*8{variable}` - dereference 8 bytes
An *rvalue* is an expression, which can be more complicated than a term. An *r-value* is an expression, which can be more complicated than a term.
rvalues are one of: r-values are one of:
- `{term}` - `{term}`
- `&{variable}` - address of variable - `&{variable}` - address of variable
- `*1{variable}` / `*2{variable}` / `*4{variable}` / `*8{variable}` - dereference 1, 2, 4, or 8 bytes - `*1{variable}` / `*2{variable}` / `*4{variable}` / `*8{variable}` - dereference 1, 2, 4, or 8 bytes
@ -218,7 +249,7 @@ rvalues are one of:
- `{term} | {term}` - `{term} | {term}`
- `{term} ^ {term}` - `{term} ^ {term}`
- `{term} < {term}` - left shift - `{term} < {term}` - left shift
- `{term} > {term}` - right shift - `{term} > {term}` - right shift (unsigned)
That's quite a lot of stuff, and it makes for a pretty powerful That's quite a lot of stuff, and it makes for a pretty powerful
language, all things considered. To test out the language, language, all things considered. To test out the language,
@ -236,5 +267,5 @@ of branching in this language (`if ... goto ...` stands in for `if`, `else if`,
you need to use a lot of labels, and that means their names can get quite long. But at least unlike you need to use a lot of labels, and that means their names can get quite long. But at least unlike
the 03 language, you'll get an error if you use the same label name twice! the 03 language, you'll get an error if you use the same label name twice!
Overall, though, this language ended up being surprisingly powerful. With any luck, the next stage will Overall, though, this language ended up being surprisingly powerful. With any luck, stage `05` will
finally be a C compiler... finally be a C compiler... But first, it's time to make [something that's not a compiler](../04a/README.html).

View file

@ -11,21 +11,21 @@ function main
local p_line local p_line
p_line = &input_line p_line = &input_line
secret_number = getrand(100) secret_number = getrand(100)
fputs(1, .str_intro) puts(.str_intro)
:guess_loop :guess_loop
fputs(1, .str_guess) puts(.str_guess)
syscall(0, 0, p_line, 30) syscall(0, 0, p_line, 30)
guess = stoi(p_line) guess = stoi(p_line)
if guess < secret_number goto too_low if guess < secret_number goto too_low
if guess > secret_number goto too_high if guess > secret_number goto too_high
fputs(1, .str_got_it) puts(.str_got_it)
return 0 return 0
:too_low :too_low
fputs(1, .str_too_low) puts(.str_too_low)
goto guess_loop goto guess_loop
:too_high :too_high
fputs(1, .str_too_high) puts(.str_too_high)
goto guess_loop goto guess_loop
:str_intro :str_intro
@ -61,7 +61,7 @@ function getrand
local n local n
ptime = &getrand_time ptime = &getrand_time
syscall(228, 1, ptime) syscall(228, 0, ptime) ; clock_gettime(CLOCK_REALTIME, ptime)
ptime += 8 ; nanoseconds at offset 8 in struct timespec ptime += 8 ; nanoseconds at offset 8 in struct timespec
n = *4ptime n = *4ptime
n %= x n %= x
@ -128,6 +128,10 @@ function fputs
syscall(1, fd, s, length) syscall(1, fd, s, length)
return return
function puts
argument s
fputs(1, s)
return
function fputn function fputn
argument fd argument fd

View file

@ -4,8 +4,8 @@ D=:global_variables
8C=D 8C=D
; initialize static_memory_end ; initialize static_memory_end
C=:static_memory_end C=:static_memory_end
; 0x80000 = 512KB for code ; 0x100000 = 1MB for code
D=x480000 D=x500000
8C=D 8C=D
; initialize labels_end ; initialize labels_end
C=:labels_end C=:labels_end
@ -1980,11 +1980,11 @@ align
x85 x85
:input_filename :input_filename
str in04b str in04
x0 x0
:output_filename :output_filename
str out04b str out04
x0 x0
:input_file_error :input_file_error

View file

View file

@ -3,15 +3,15 @@ all: markdown README.html
$(MAKE) -C 01 $(MAKE) -C 01
$(MAKE) -C 02 $(MAKE) -C 02
$(MAKE) -C 03 $(MAKE) -C 03
$(MAKE) -C 04
$(MAKE) -C 04a $(MAKE) -C 04a
$(MAKE) -C 04b
clean: clean:
$(MAKE) -C 00 clean $(MAKE) -C 00 clean
$(MAKE) -C 01 clean $(MAKE) -C 01 clean
$(MAKE) -C 02 clean $(MAKE) -C 02 clean
$(MAKE) -C 03 clean $(MAKE) -C 03 clean
$(MAKE) -C 04 clean
$(MAKE) -C 04a clean $(MAKE) -C 04a clean
$(MAKE) -C 04b clean
rm -f markdown rm -f markdown
rm -f README.html rm -f README.html
markdown: markdown.c markdown: markdown.c

View file

@ -26,8 +26,8 @@ command codes.
- [stage 02](02/README.md) - a language with labels - [stage 02](02/README.md) - a language with labels
- [stage 03](03/README.md) - a language with longer labels, better error messages, and less register manipulation - [stage 03](03/README.md) - a language with longer labels, better error messages, and less register manipulation
- more coming soon (hopefully) - more coming soon (hopefully)
- [stage 04](04/README.md) - a language with nice functions and local variables
- [stage 04a](04a/README.md) - (interlude) a very simple preprocessor - [stage 04a](04a/README.md) - (interlude) a very simple preprocessor
- [stage 04b](04b/README.md) - a language with nice functions and local variables
## prerequisite knowledge ## prerequisite knowledge
@ -114,4 +114,4 @@ shall not be held liable in connection with it.
## contributing ## contributing
If you notice a mistake/want to clarify something, you can submit a pull request If you notice a mistake/want to clarify something, you can submit a pull request
via GitHub, or email `pommicket at pommicket.com`. Translations are welcome! via GitHub, or email `pommicket at pommicket.com`.