rename 04b => 04, better 04 README
This commit is contained in:
parent
4cd2b7047c
commit
519069a89d
8 changed files with 76 additions and 41 deletions
|
@ -165,4 +165,4 @@ you need to make sure you store away any information you'll need after the funct
|
|||
And the language definitely won't be as nice to use as something with real variables. But overall,
|
||||
I'm very happy with this compiler, especially considering it's written in a language with 2-letter label
|
||||
names.
|
||||
With that, let's move on to the [next stage](../04a/README.md).
|
||||
With that, let's move on to the [next stage](../04/README.md).
|
||||
|
|
|
@ -1,9 +1,9 @@
|
|||
all: out03 guessing_game.out out04b README.html
|
||||
all: out03 guessing_game.out out04 README.html
|
||||
out03: in03 ../03/out02
|
||||
../03/out02
|
||||
%.html: %.md ../markdown
|
||||
../markdown $<
|
||||
out04b: in04b out03
|
||||
out04: in04 out03
|
||||
./out03
|
||||
%.out: % out03
|
||||
./out03 $< $@
|
|
@ -1,38 +1,60 @@
|
|||
# stage 04
|
||||
|
||||
As usual, the source for this compiler is `in03`, an input to the [previous compiler](../03/README.md).
|
||||
`in04b` contains a hello world program written in the stage 4 language.
|
||||
`in04` contains a hello world program written in the stage 4 language.
|
||||
Here is the core of the program:
|
||||
|
||||
```
|
||||
main()
|
||||
```main()
|
||||
|
||||
function main
|
||||
puts(.str_hello_world)
|
||||
putc(10) ; newline
|
||||
syscall(0x3c, 0)
|
||||
```
|
||||
|
||||
As you can see, we can now pass arguments to functions. And let's take a look at `putc`:
|
||||
:str_hello_world
|
||||
string Hello, world!
|
||||
byte 0
|
||||
|
||||
function strlen
|
||||
argument s
|
||||
local c
|
||||
local p
|
||||
p = s
|
||||
:strlen_loop
|
||||
c = *1p
|
||||
if c == 0 goto strlen_loop_end
|
||||
p += 1
|
||||
goto strlen_loop
|
||||
:strlen_loop_end
|
||||
return p - s
|
||||
|
||||
```
|
||||
function putc
|
||||
argument c
|
||||
local p
|
||||
p = &c
|
||||
syscall(1, 1, p, 1)
|
||||
return
|
||||
|
||||
function puts
|
||||
argument s
|
||||
local len
|
||||
len = strlen(s)
|
||||
syscall(1, 1, s, len)
|
||||
return
|
||||
```
|
||||
|
||||
It's so simple compared to previous languages! Rather than mess around with registers, we can now
|
||||
declare local (and global) variables, and use them directly. These variables will be placed on the
|
||||
It's so simple compared to previous languages!
|
||||
Importantly, functions now have arguments and return values.
|
||||
Rather than mess around with registers, we can now
|
||||
declare local (and global) variables, and use them directly.
|
||||
These variables will be placed on the
|
||||
stack. Since arguments are also placed on the stack,
|
||||
by implementing local variables we get arguments for free. There is no difference
|
||||
between the `local` and `argument` keywords in this language other than spelling.
|
||||
In fact, the number of agruments to a function call is not checked against
|
||||
how many arguments the function has. This does make it easy to screw things up by calling a function
|
||||
with the wrong number of arguments, but it also means that we can provide a variable number of arguments
|
||||
to the `syscall` function. Speaking of which, if you look at the bottom of `in04b`, you'll see:
|
||||
to the `syscall` function. Speaking of which, if you look at the bottom of `in04`, you'll see:
|
||||
|
||||
```
|
||||
function syscall
|
||||
|
@ -53,6 +75,7 @@ Instead, `syscall` is a function written manually in machine language.
|
|||
We can take a look at its decompilation to make things clearer:
|
||||
|
||||
```
|
||||
(...function prologue...)
|
||||
mov rax,[rbp-0x10]
|
||||
mov rdi,rax
|
||||
mov rax,[rbp-0x18]
|
||||
|
@ -67,6 +90,7 @@ mov rax,[rbp-0x38]
|
|||
mov r9,rax
|
||||
mov rax,[rbp-0x8]
|
||||
syscall
|
||||
(...function epilogue...)
|
||||
```
|
||||
|
||||
This just sets `rax`, `rdi`, `rsi`, etc. to the arguments the function was called with,
|
||||
|
@ -133,22 +157,29 @@ Note that setting `rsp` very specifically rather than just doing `sub rsp, 8` is
|
|||
if we skip over some code with a local variable declaration, or execute a local declaration twice,
|
||||
we want `rsp` to be in the right place.
|
||||
The first three and last three instructions above are called the function *prologue* and *epilogue*.
|
||||
They are all the same for all functions; a prologue is generated at the start of every function,
|
||||
They are the same for all functions; a prologue is generated at the start of every function,
|
||||
and an epilogue is generated for every return statement.
|
||||
The return value is placed in `rax`.
|
||||
|
||||
## global variables
|
||||
|
||||
Global variables are much simpler than local ones. The variable `:static_memory_end` in the compiler
|
||||
keeps track of where to put the next global variable in memory. It is initialized at address `0x440000`,
|
||||
which gives us 256KB for code (and strings). When a global variable is added, `:static_memory_end` is increased
|
||||
keeps track of where to put the next global variable in memory. It is initialized at address `0x500000`,
|
||||
which gives us 1MB for code (and strings). When a global variable is added, `:static_memory_end` is increased
|
||||
by its size.
|
||||
|
||||
## misc improvements
|
||||
|
||||
- Errors now give you the line number in decimal instead of hexadecimal.
|
||||
- You get an error if you declare a label (or a variable) twice.
|
||||
- Conditional jumping is much nicer: e.g. `if x == 3 goto some_label`
|
||||
- Comments can now appear on lines with code.
|
||||
- You don't need a `d` prefix for decimal numbers.
|
||||
- You can control the input and output filenames with command-line arguments (by default, `in04` and `out04` are used).
|
||||
|
||||
## language description
|
||||
|
||||
Comments begin with `;` and may be put at the end of lines
|
||||
with or without code.
|
||||
Blank lines are ignored.
|
||||
Comments begin with `;`.
|
||||
|
||||
To make the compiler simpler, this language doesn't support fancy
|
||||
expressions like `2 * (3 + 5) / 6`. There is a limited set of possible
|
||||
|
@ -176,7 +207,7 @@ conditionally jump to the specified label. `{operator}` should be one of
|
|||
- `{lvalue} |= {rvalue}`
|
||||
- `{lvalue} ^= {rvalue}`
|
||||
- `{lvalue} <= {rvalue}` - left shift `lvalue` by `rvalue`
|
||||
- `{lvalue} >= {rvalue}` - right shift `lvalue` by `rvalue`
|
||||
- `{lvalue} >= {rvalue}` - right shift `lvalue` by `rvalue` (unsigned)
|
||||
- `{function}({term}, {term}, ...)` - function call, ignoring the return value
|
||||
- `return {rvalue}`
|
||||
- `string {str}` - places a literal string in the code
|
||||
|
@ -185,7 +216,7 @@ conditionally jump to the specified label. `{operator}` should be one of
|
|||
Now let's get down into the weeds:
|
||||
|
||||
A a *number* is one of:
|
||||
- `{decimal number}` - e.g. `108` (note: there's no `d` prefix anymore)
|
||||
- `{decimal number}` - e.g. `108`
|
||||
- `0x{hexadecimal number}` - e.g. `0x2f` for 47
|
||||
- `'{character}` - e.g. `'a` for 97 (the character code for `a`)
|
||||
|
||||
|
@ -194,7 +225,7 @@ A *term* is one of:
|
|||
- `.{label name}` - the address of a label
|
||||
- `{number}`
|
||||
|
||||
An *lvalue* is the left-hand side of an assignment expression,
|
||||
An *l-value* is the left-hand side of an assignment expression,
|
||||
and it is one of:
|
||||
- `{variable}`
|
||||
- `*1{variable}` - dereference 1 byte
|
||||
|
@ -202,8 +233,8 @@ and it is one of:
|
|||
- `*4{variable}` - dereference 4 bytes
|
||||
- `*8{variable}` - dereference 8 bytes
|
||||
|
||||
An *rvalue* is an expression, which can be more complicated than a term.
|
||||
rvalues are one of:
|
||||
An *r-value* is an expression, which can be more complicated than a term.
|
||||
r-values are one of:
|
||||
- `{term}`
|
||||
- `&{variable}` - address of variable
|
||||
- `*1{variable}` / `*2{variable}` / `*4{variable}` / `*8{variable}` - dereference 1, 2, 4, or 8 bytes
|
||||
|
@ -218,7 +249,7 @@ rvalues are one of:
|
|||
- `{term} | {term}`
|
||||
- `{term} ^ {term}`
|
||||
- `{term} < {term}` - left shift
|
||||
- `{term} > {term}` - right shift
|
||||
- `{term} > {term}` - right shift (unsigned)
|
||||
|
||||
That's quite a lot of stuff, and it makes for a pretty powerful
|
||||
language, all things considered. To test out the language,
|
||||
|
@ -236,5 +267,5 @@ of branching in this language (`if ... goto ...` stands in for `if`, `else if`,
|
|||
you need to use a lot of labels, and that means their names can get quite long. But at least unlike
|
||||
the 03 language, you'll get an error if you use the same label name twice!
|
||||
|
||||
Overall, though, this language ended up being surprisingly powerful. With any luck, the next stage will
|
||||
finally be a C compiler...
|
||||
Overall, though, this language ended up being surprisingly powerful. With any luck, stage `05` will
|
||||
finally be a C compiler... But first, it's time to make [something that's not a compiler](../04a/README.html).
|
|
@ -11,21 +11,21 @@ function main
|
|||
local p_line
|
||||
p_line = &input_line
|
||||
secret_number = getrand(100)
|
||||
fputs(1, .str_intro)
|
||||
puts(.str_intro)
|
||||
|
||||
:guess_loop
|
||||
fputs(1, .str_guess)
|
||||
puts(.str_guess)
|
||||
syscall(0, 0, p_line, 30)
|
||||
guess = stoi(p_line)
|
||||
if guess < secret_number goto too_low
|
||||
if guess > secret_number goto too_high
|
||||
fputs(1, .str_got_it)
|
||||
puts(.str_got_it)
|
||||
return 0
|
||||
:too_low
|
||||
fputs(1, .str_too_low)
|
||||
puts(.str_too_low)
|
||||
goto guess_loop
|
||||
:too_high
|
||||
fputs(1, .str_too_high)
|
||||
puts(.str_too_high)
|
||||
goto guess_loop
|
||||
|
||||
:str_intro
|
||||
|
@ -61,7 +61,7 @@ function getrand
|
|||
local n
|
||||
|
||||
ptime = &getrand_time
|
||||
syscall(228, 1, ptime)
|
||||
syscall(228, 0, ptime) ; clock_gettime(CLOCK_REALTIME, ptime)
|
||||
ptime += 8 ; nanoseconds at offset 8 in struct timespec
|
||||
n = *4ptime
|
||||
n %= x
|
||||
|
@ -128,6 +128,10 @@ function fputs
|
|||
syscall(1, fd, s, length)
|
||||
return
|
||||
|
||||
function puts
|
||||
argument s
|
||||
fputs(1, s)
|
||||
return
|
||||
|
||||
function fputn
|
||||
argument fd
|
|
@ -4,8 +4,8 @@ D=:global_variables
|
|||
8C=D
|
||||
; initialize static_memory_end
|
||||
C=:static_memory_end
|
||||
; 0x80000 = 512KB for code
|
||||
D=x480000
|
||||
; 0x100000 = 1MB for code
|
||||
D=x500000
|
||||
8C=D
|
||||
; initialize labels_end
|
||||
C=:labels_end
|
||||
|
@ -1980,11 +1980,11 @@ align
|
|||
x85
|
||||
|
||||
:input_filename
|
||||
str in04b
|
||||
str in04
|
||||
x0
|
||||
|
||||
:output_filename
|
||||
str out04b
|
||||
str out04
|
||||
x0
|
||||
|
||||
:input_file_error
|
4
Makefile
4
Makefile
|
@ -3,15 +3,15 @@ all: markdown README.html
|
|||
$(MAKE) -C 01
|
||||
$(MAKE) -C 02
|
||||
$(MAKE) -C 03
|
||||
$(MAKE) -C 04
|
||||
$(MAKE) -C 04a
|
||||
$(MAKE) -C 04b
|
||||
clean:
|
||||
$(MAKE) -C 00 clean
|
||||
$(MAKE) -C 01 clean
|
||||
$(MAKE) -C 02 clean
|
||||
$(MAKE) -C 03 clean
|
||||
$(MAKE) -C 04 clean
|
||||
$(MAKE) -C 04a clean
|
||||
$(MAKE) -C 04b clean
|
||||
rm -f markdown
|
||||
rm -f README.html
|
||||
markdown: markdown.c
|
||||
|
|
|
@ -26,8 +26,8 @@ command codes.
|
|||
- [stage 02](02/README.md) - a language with labels
|
||||
- [stage 03](03/README.md) - a language with longer labels, better error messages, and less register manipulation
|
||||
- more coming soon (hopefully)
|
||||
- [stage 04](04/README.md) - a language with nice functions and local variables
|
||||
- [stage 04a](04a/README.md) - (interlude) a very simple preprocessor
|
||||
- [stage 04b](04b/README.md) - a language with nice functions and local variables
|
||||
|
||||
## prerequisite knowledge
|
||||
|
||||
|
@ -114,4 +114,4 @@ shall not be held liable in connection with it.
|
|||
## contributing
|
||||
|
||||
If you notice a mistake/want to clarify something, you can submit a pull request
|
||||
via GitHub, or email `pommicket at pommicket.com`. Translations are welcome!
|
||||
via GitHub, or email `pommicket at pommicket.com`.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue