rename 04b => 04, better 04 README
This commit is contained in:
parent
4cd2b7047c
commit
519069a89d
8 changed files with 76 additions and 41 deletions
|
@ -165,4 +165,4 @@ you need to make sure you store away any information you'll need after the funct
|
||||||
And the language definitely won't be as nice to use as something with real variables. But overall,
|
And the language definitely won't be as nice to use as something with real variables. But overall,
|
||||||
I'm very happy with this compiler, especially considering it's written in a language with 2-letter label
|
I'm very happy with this compiler, especially considering it's written in a language with 2-letter label
|
||||||
names.
|
names.
|
||||||
With that, let's move on to the [next stage](../04a/README.md).
|
With that, let's move on to the [next stage](../04/README.md).
|
||||||
|
|
|
@ -1,9 +1,9 @@
|
||||||
all: out03 guessing_game.out out04b README.html
|
all: out03 guessing_game.out out04 README.html
|
||||||
out03: in03 ../03/out02
|
out03: in03 ../03/out02
|
||||||
../03/out02
|
../03/out02
|
||||||
%.html: %.md ../markdown
|
%.html: %.md ../markdown
|
||||||
../markdown $<
|
../markdown $<
|
||||||
out04b: in04b out03
|
out04: in04 out03
|
||||||
./out03
|
./out03
|
||||||
%.out: % out03
|
%.out: % out03
|
||||||
./out03 $< $@
|
./out03 $< $@
|
|
@ -1,38 +1,60 @@
|
||||||
# stage 04
|
# stage 04
|
||||||
|
|
||||||
As usual, the source for this compiler is `in03`, an input to the [previous compiler](../03/README.md).
|
As usual, the source for this compiler is `in03`, an input to the [previous compiler](../03/README.md).
|
||||||
`in04b` contains a hello world program written in the stage 4 language.
|
`in04` contains a hello world program written in the stage 4 language.
|
||||||
Here is the core of the program:
|
Here is the core of the program:
|
||||||
|
|
||||||
```
|
```main()
|
||||||
main()
|
|
||||||
|
|
||||||
function main
|
function main
|
||||||
puts(.str_hello_world)
|
puts(.str_hello_world)
|
||||||
putc(10) ; newline
|
putc(10) ; newline
|
||||||
syscall(0x3c, 0)
|
syscall(0x3c, 0)
|
||||||
```
|
|
||||||
|
|
||||||
As you can see, we can now pass arguments to functions. And let's take a look at `putc`:
|
:str_hello_world
|
||||||
|
string Hello, world!
|
||||||
|
byte 0
|
||||||
|
|
||||||
|
function strlen
|
||||||
|
argument s
|
||||||
|
local c
|
||||||
|
local p
|
||||||
|
p = s
|
||||||
|
:strlen_loop
|
||||||
|
c = *1p
|
||||||
|
if c == 0 goto strlen_loop_end
|
||||||
|
p += 1
|
||||||
|
goto strlen_loop
|
||||||
|
:strlen_loop_end
|
||||||
|
return p - s
|
||||||
|
|
||||||
```
|
|
||||||
function putc
|
function putc
|
||||||
argument c
|
argument c
|
||||||
local p
|
local p
|
||||||
p = &c
|
p = &c
|
||||||
syscall(1, 1, p, 1)
|
syscall(1, 1, p, 1)
|
||||||
return
|
return
|
||||||
|
|
||||||
|
function puts
|
||||||
|
argument s
|
||||||
|
local len
|
||||||
|
len = strlen(s)
|
||||||
|
syscall(1, 1, s, len)
|
||||||
|
return
|
||||||
```
|
```
|
||||||
|
|
||||||
It's so simple compared to previous languages! Rather than mess around with registers, we can now
|
It's so simple compared to previous languages!
|
||||||
declare local (and global) variables, and use them directly. These variables will be placed on the
|
Importantly, functions now have arguments and return values.
|
||||||
|
Rather than mess around with registers, we can now
|
||||||
|
declare local (and global) variables, and use them directly.
|
||||||
|
These variables will be placed on the
|
||||||
stack. Since arguments are also placed on the stack,
|
stack. Since arguments are also placed on the stack,
|
||||||
by implementing local variables we get arguments for free. There is no difference
|
by implementing local variables we get arguments for free. There is no difference
|
||||||
between the `local` and `argument` keywords in this language other than spelling.
|
between the `local` and `argument` keywords in this language other than spelling.
|
||||||
In fact, the number of agruments to a function call is not checked against
|
In fact, the number of agruments to a function call is not checked against
|
||||||
how many arguments the function has. This does make it easy to screw things up by calling a function
|
how many arguments the function has. This does make it easy to screw things up by calling a function
|
||||||
with the wrong number of arguments, but it also means that we can provide a variable number of arguments
|
with the wrong number of arguments, but it also means that we can provide a variable number of arguments
|
||||||
to the `syscall` function. Speaking of which, if you look at the bottom of `in04b`, you'll see:
|
to the `syscall` function. Speaking of which, if you look at the bottom of `in04`, you'll see:
|
||||||
|
|
||||||
```
|
```
|
||||||
function syscall
|
function syscall
|
||||||
|
@ -53,6 +75,7 @@ Instead, `syscall` is a function written manually in machine language.
|
||||||
We can take a look at its decompilation to make things clearer:
|
We can take a look at its decompilation to make things clearer:
|
||||||
|
|
||||||
```
|
```
|
||||||
|
(...function prologue...)
|
||||||
mov rax,[rbp-0x10]
|
mov rax,[rbp-0x10]
|
||||||
mov rdi,rax
|
mov rdi,rax
|
||||||
mov rax,[rbp-0x18]
|
mov rax,[rbp-0x18]
|
||||||
|
@ -67,6 +90,7 @@ mov rax,[rbp-0x38]
|
||||||
mov r9,rax
|
mov r9,rax
|
||||||
mov rax,[rbp-0x8]
|
mov rax,[rbp-0x8]
|
||||||
syscall
|
syscall
|
||||||
|
(...function epilogue...)
|
||||||
```
|
```
|
||||||
|
|
||||||
This just sets `rax`, `rdi`, `rsi`, etc. to the arguments the function was called with,
|
This just sets `rax`, `rdi`, `rsi`, etc. to the arguments the function was called with,
|
||||||
|
@ -133,22 +157,29 @@ Note that setting `rsp` very specifically rather than just doing `sub rsp, 8` is
|
||||||
if we skip over some code with a local variable declaration, or execute a local declaration twice,
|
if we skip over some code with a local variable declaration, or execute a local declaration twice,
|
||||||
we want `rsp` to be in the right place.
|
we want `rsp` to be in the right place.
|
||||||
The first three and last three instructions above are called the function *prologue* and *epilogue*.
|
The first three and last three instructions above are called the function *prologue* and *epilogue*.
|
||||||
They are all the same for all functions; a prologue is generated at the start of every function,
|
They are the same for all functions; a prologue is generated at the start of every function,
|
||||||
and an epilogue is generated for every return statement.
|
and an epilogue is generated for every return statement.
|
||||||
The return value is placed in `rax`.
|
The return value is placed in `rax`.
|
||||||
|
|
||||||
## global variables
|
## global variables
|
||||||
|
|
||||||
Global variables are much simpler than local ones. The variable `:static_memory_end` in the compiler
|
Global variables are much simpler than local ones. The variable `:static_memory_end` in the compiler
|
||||||
keeps track of where to put the next global variable in memory. It is initialized at address `0x440000`,
|
keeps track of where to put the next global variable in memory. It is initialized at address `0x500000`,
|
||||||
which gives us 256KB for code (and strings). When a global variable is added, `:static_memory_end` is increased
|
which gives us 1MB for code (and strings). When a global variable is added, `:static_memory_end` is increased
|
||||||
by its size.
|
by its size.
|
||||||
|
|
||||||
|
## misc improvements
|
||||||
|
|
||||||
|
- Errors now give you the line number in decimal instead of hexadecimal.
|
||||||
|
- You get an error if you declare a label (or a variable) twice.
|
||||||
|
- Conditional jumping is much nicer: e.g. `if x == 3 goto some_label`
|
||||||
|
- Comments can now appear on lines with code.
|
||||||
|
- You don't need a `d` prefix for decimal numbers.
|
||||||
|
- You can control the input and output filenames with command-line arguments (by default, `in04` and `out04` are used).
|
||||||
|
|
||||||
## language description
|
## language description
|
||||||
|
|
||||||
Comments begin with `;` and may be put at the end of lines
|
Comments begin with `;`.
|
||||||
with or without code.
|
|
||||||
Blank lines are ignored.
|
|
||||||
|
|
||||||
To make the compiler simpler, this language doesn't support fancy
|
To make the compiler simpler, this language doesn't support fancy
|
||||||
expressions like `2 * (3 + 5) / 6`. There is a limited set of possible
|
expressions like `2 * (3 + 5) / 6`. There is a limited set of possible
|
||||||
|
@ -176,7 +207,7 @@ conditionally jump to the specified label. `{operator}` should be one of
|
||||||
- `{lvalue} |= {rvalue}`
|
- `{lvalue} |= {rvalue}`
|
||||||
- `{lvalue} ^= {rvalue}`
|
- `{lvalue} ^= {rvalue}`
|
||||||
- `{lvalue} <= {rvalue}` - left shift `lvalue` by `rvalue`
|
- `{lvalue} <= {rvalue}` - left shift `lvalue` by `rvalue`
|
||||||
- `{lvalue} >= {rvalue}` - right shift `lvalue` by `rvalue`
|
- `{lvalue} >= {rvalue}` - right shift `lvalue` by `rvalue` (unsigned)
|
||||||
- `{function}({term}, {term}, ...)` - function call, ignoring the return value
|
- `{function}({term}, {term}, ...)` - function call, ignoring the return value
|
||||||
- `return {rvalue}`
|
- `return {rvalue}`
|
||||||
- `string {str}` - places a literal string in the code
|
- `string {str}` - places a literal string in the code
|
||||||
|
@ -185,7 +216,7 @@ conditionally jump to the specified label. `{operator}` should be one of
|
||||||
Now let's get down into the weeds:
|
Now let's get down into the weeds:
|
||||||
|
|
||||||
A a *number* is one of:
|
A a *number* is one of:
|
||||||
- `{decimal number}` - e.g. `108` (note: there's no `d` prefix anymore)
|
- `{decimal number}` - e.g. `108`
|
||||||
- `0x{hexadecimal number}` - e.g. `0x2f` for 47
|
- `0x{hexadecimal number}` - e.g. `0x2f` for 47
|
||||||
- `'{character}` - e.g. `'a` for 97 (the character code for `a`)
|
- `'{character}` - e.g. `'a` for 97 (the character code for `a`)
|
||||||
|
|
||||||
|
@ -194,7 +225,7 @@ A *term* is one of:
|
||||||
- `.{label name}` - the address of a label
|
- `.{label name}` - the address of a label
|
||||||
- `{number}`
|
- `{number}`
|
||||||
|
|
||||||
An *lvalue* is the left-hand side of an assignment expression,
|
An *l-value* is the left-hand side of an assignment expression,
|
||||||
and it is one of:
|
and it is one of:
|
||||||
- `{variable}`
|
- `{variable}`
|
||||||
- `*1{variable}` - dereference 1 byte
|
- `*1{variable}` - dereference 1 byte
|
||||||
|
@ -202,8 +233,8 @@ and it is one of:
|
||||||
- `*4{variable}` - dereference 4 bytes
|
- `*4{variable}` - dereference 4 bytes
|
||||||
- `*8{variable}` - dereference 8 bytes
|
- `*8{variable}` - dereference 8 bytes
|
||||||
|
|
||||||
An *rvalue* is an expression, which can be more complicated than a term.
|
An *r-value* is an expression, which can be more complicated than a term.
|
||||||
rvalues are one of:
|
r-values are one of:
|
||||||
- `{term}`
|
- `{term}`
|
||||||
- `&{variable}` - address of variable
|
- `&{variable}` - address of variable
|
||||||
- `*1{variable}` / `*2{variable}` / `*4{variable}` / `*8{variable}` - dereference 1, 2, 4, or 8 bytes
|
- `*1{variable}` / `*2{variable}` / `*4{variable}` / `*8{variable}` - dereference 1, 2, 4, or 8 bytes
|
||||||
|
@ -218,7 +249,7 @@ rvalues are one of:
|
||||||
- `{term} | {term}`
|
- `{term} | {term}`
|
||||||
- `{term} ^ {term}`
|
- `{term} ^ {term}`
|
||||||
- `{term} < {term}` - left shift
|
- `{term} < {term}` - left shift
|
||||||
- `{term} > {term}` - right shift
|
- `{term} > {term}` - right shift (unsigned)
|
||||||
|
|
||||||
That's quite a lot of stuff, and it makes for a pretty powerful
|
That's quite a lot of stuff, and it makes for a pretty powerful
|
||||||
language, all things considered. To test out the language,
|
language, all things considered. To test out the language,
|
||||||
|
@ -236,5 +267,5 @@ of branching in this language (`if ... goto ...` stands in for `if`, `else if`,
|
||||||
you need to use a lot of labels, and that means their names can get quite long. But at least unlike
|
you need to use a lot of labels, and that means their names can get quite long. But at least unlike
|
||||||
the 03 language, you'll get an error if you use the same label name twice!
|
the 03 language, you'll get an error if you use the same label name twice!
|
||||||
|
|
||||||
Overall, though, this language ended up being surprisingly powerful. With any luck, the next stage will
|
Overall, though, this language ended up being surprisingly powerful. With any luck, stage `05` will
|
||||||
finally be a C compiler...
|
finally be a C compiler... But first, it's time to make [something that's not a compiler](../04a/README.html).
|
|
@ -11,21 +11,21 @@ function main
|
||||||
local p_line
|
local p_line
|
||||||
p_line = &input_line
|
p_line = &input_line
|
||||||
secret_number = getrand(100)
|
secret_number = getrand(100)
|
||||||
fputs(1, .str_intro)
|
puts(.str_intro)
|
||||||
|
|
||||||
:guess_loop
|
:guess_loop
|
||||||
fputs(1, .str_guess)
|
puts(.str_guess)
|
||||||
syscall(0, 0, p_line, 30)
|
syscall(0, 0, p_line, 30)
|
||||||
guess = stoi(p_line)
|
guess = stoi(p_line)
|
||||||
if guess < secret_number goto too_low
|
if guess < secret_number goto too_low
|
||||||
if guess > secret_number goto too_high
|
if guess > secret_number goto too_high
|
||||||
fputs(1, .str_got_it)
|
puts(.str_got_it)
|
||||||
return 0
|
return 0
|
||||||
:too_low
|
:too_low
|
||||||
fputs(1, .str_too_low)
|
puts(.str_too_low)
|
||||||
goto guess_loop
|
goto guess_loop
|
||||||
:too_high
|
:too_high
|
||||||
fputs(1, .str_too_high)
|
puts(.str_too_high)
|
||||||
goto guess_loop
|
goto guess_loop
|
||||||
|
|
||||||
:str_intro
|
:str_intro
|
||||||
|
@ -61,7 +61,7 @@ function getrand
|
||||||
local n
|
local n
|
||||||
|
|
||||||
ptime = &getrand_time
|
ptime = &getrand_time
|
||||||
syscall(228, 1, ptime)
|
syscall(228, 0, ptime) ; clock_gettime(CLOCK_REALTIME, ptime)
|
||||||
ptime += 8 ; nanoseconds at offset 8 in struct timespec
|
ptime += 8 ; nanoseconds at offset 8 in struct timespec
|
||||||
n = *4ptime
|
n = *4ptime
|
||||||
n %= x
|
n %= x
|
||||||
|
@ -128,6 +128,10 @@ function fputs
|
||||||
syscall(1, fd, s, length)
|
syscall(1, fd, s, length)
|
||||||
return
|
return
|
||||||
|
|
||||||
|
function puts
|
||||||
|
argument s
|
||||||
|
fputs(1, s)
|
||||||
|
return
|
||||||
|
|
||||||
function fputn
|
function fputn
|
||||||
argument fd
|
argument fd
|
|
@ -4,8 +4,8 @@ D=:global_variables
|
||||||
8C=D
|
8C=D
|
||||||
; initialize static_memory_end
|
; initialize static_memory_end
|
||||||
C=:static_memory_end
|
C=:static_memory_end
|
||||||
; 0x80000 = 512KB for code
|
; 0x100000 = 1MB for code
|
||||||
D=x480000
|
D=x500000
|
||||||
8C=D
|
8C=D
|
||||||
; initialize labels_end
|
; initialize labels_end
|
||||||
C=:labels_end
|
C=:labels_end
|
||||||
|
@ -1980,11 +1980,11 @@ align
|
||||||
x85
|
x85
|
||||||
|
|
||||||
:input_filename
|
:input_filename
|
||||||
str in04b
|
str in04
|
||||||
x0
|
x0
|
||||||
|
|
||||||
:output_filename
|
:output_filename
|
||||||
str out04b
|
str out04
|
||||||
x0
|
x0
|
||||||
|
|
||||||
:input_file_error
|
:input_file_error
|
4
Makefile
4
Makefile
|
@ -3,15 +3,15 @@ all: markdown README.html
|
||||||
$(MAKE) -C 01
|
$(MAKE) -C 01
|
||||||
$(MAKE) -C 02
|
$(MAKE) -C 02
|
||||||
$(MAKE) -C 03
|
$(MAKE) -C 03
|
||||||
|
$(MAKE) -C 04
|
||||||
$(MAKE) -C 04a
|
$(MAKE) -C 04a
|
||||||
$(MAKE) -C 04b
|
|
||||||
clean:
|
clean:
|
||||||
$(MAKE) -C 00 clean
|
$(MAKE) -C 00 clean
|
||||||
$(MAKE) -C 01 clean
|
$(MAKE) -C 01 clean
|
||||||
$(MAKE) -C 02 clean
|
$(MAKE) -C 02 clean
|
||||||
$(MAKE) -C 03 clean
|
$(MAKE) -C 03 clean
|
||||||
|
$(MAKE) -C 04 clean
|
||||||
$(MAKE) -C 04a clean
|
$(MAKE) -C 04a clean
|
||||||
$(MAKE) -C 04b clean
|
|
||||||
rm -f markdown
|
rm -f markdown
|
||||||
rm -f README.html
|
rm -f README.html
|
||||||
markdown: markdown.c
|
markdown: markdown.c
|
||||||
|
|
|
@ -26,8 +26,8 @@ command codes.
|
||||||
- [stage 02](02/README.md) - a language with labels
|
- [stage 02](02/README.md) - a language with labels
|
||||||
- [stage 03](03/README.md) - a language with longer labels, better error messages, and less register manipulation
|
- [stage 03](03/README.md) - a language with longer labels, better error messages, and less register manipulation
|
||||||
- more coming soon (hopefully)
|
- more coming soon (hopefully)
|
||||||
|
- [stage 04](04/README.md) - a language with nice functions and local variables
|
||||||
- [stage 04a](04a/README.md) - (interlude) a very simple preprocessor
|
- [stage 04a](04a/README.md) - (interlude) a very simple preprocessor
|
||||||
- [stage 04b](04b/README.md) - a language with nice functions and local variables
|
|
||||||
|
|
||||||
## prerequisite knowledge
|
## prerequisite knowledge
|
||||||
|
|
||||||
|
@ -114,4 +114,4 @@ shall not be held liable in connection with it.
|
||||||
## contributing
|
## contributing
|
||||||
|
|
||||||
If you notice a mistake/want to clarify something, you can submit a pull request
|
If you notice a mistake/want to clarify something, you can submit a pull request
|
||||||
via GitHub, or email `pommicket at pommicket.com`. Translations are welcome!
|
via GitHub, or email `pommicket at pommicket.com`.
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue