readme edits

This commit is contained in:
pommicket 2021-11-10 00:52:34 -05:00
parent befd4a6435
commit 3255cd32d7
7 changed files with 196 additions and 63 deletions

View file

@ -5,13 +5,14 @@ takes input file `in00` containing space/newline/(any character)-separated
hexadecimal digit pairs (e.g. `3f`) and outputs them as bytes to the file hexadecimal digit pairs (e.g. `3f`) and outputs them as bytes to the file
`out00`. On 64-bit Linux, try running `./hexcompile` from this directory (I've `out00`. On 64-bit Linux, try running `./hexcompile` from this directory (I've
already provided an `in00` file, which you can take a look at), and you will get already provided an `in00` file, which you can take a look at), and you will get
a file named `out00` containing the text `Hello, world!`. This stage is needed a file named `out00` containing the text `Hello, world!`. This stage
so that you can use your favorite text editor to write executables by hand lets you use your favorite text editor to write executables
(which have bytes outside of ASCII/UTF-8). I wrote it with a program called (which have bytes outside of ASCII/UTF-8).
hexedit, which can be found on most Linux distributions. Only 64-bit Linux is I made `hexcompile` with a program called
supported, because each OS/architecture combination would need its own separate [hexedit](https://github.com/pixel/hexedit),
executable. The executable is just 632 bytes long, and you could definitely make which can be found in most Linux package managers.
it even smaller if you wanted to. Let's take a look at what's inside (`od -t x1 The executable is just 632 bytes long.
Let's take a look at what's inside (`od -t x1
-An -v hexcompile`): -An -v hexcompile`):
``` ```
@ -74,12 +75,12 @@ version of ELF)
- `02 00` Object type = executable file (not a dynamic library/etc.) - `02 00` Object type = executable file (not a dynamic library/etc.)
- `3e 00` Architecture x86-64 - `3e 00` Architecture x86-64
- `01 00 00 00` Version 1 of ELF, again - `01 00 00 00` Version 1 of ELF, again
- `78 00 40 00 00 00 00 00` **Entry point of the executable** = 0x400078 (explained later) - `78 00 40 00 00 00 00 00` **Entry point of the executable** = 0x400078
- `40 00 00 00 00 00 00 00` Program header table offset in bytes from start of file (see below) - `40 00 00 00 00 00 00 00` Program header table offset in bytes from start of file
- `00 00 00 00 00 00 00 00` Section header table offset (we're not using sections) - `00 00 00 00 00 00 00 00` Section header table offset (we're not using sections)
- `00 00 00 00` Flags (not important) - `00 00 00 00` Flags (not important to us)
- `40 00` The size of this header, in bytes = 64 - `40 00` The size of this header, in bytes = 64
- `38 00` Size of the program header (see below) = 56 - `38 00` Size of the program header = 56
- `01 00` Number of program headers = 1 - `01 00` Number of program headers = 1
- `00 00` Size of each section header (unused) - `00 00` Size of each section header (unused)
- `00 00` Number of section headers (unused) - `00 00` Number of section headers (unused)
@ -88,8 +89,8 @@ version of ELF)
You might notice that all the numbers are backwards, e.g. `38 00` for the number You might notice that all the numbers are backwards, e.g. `38 00` for the number
0x0038 (56 decimal). This is because almost all modern architectures (including 0x0038 (56 decimal). This is because almost all modern architectures (including
x86-64) are little-endian, meaning that the *least significant byte* goes first, x86-64) are little-endian, meaning that the *least significant byte* goes first,
and the most significant byte goes last. There are various reasons why this is and the most significant byte goes last.
easier to deal with, but I won't explain that here. There are reasons for this ([see here](https://en.wikipedia.org/wiki/Endianness#Optimization), for example, if you're interested).
## program header ## program header
The program header describes a segment of data that is loaded into memory when The program header describes a segment of data that is loaded into memory when
@ -108,11 +109,12 @@ Without further ado, here's the contents of the program header:
**wait a minute, what's that?** **wait a minute, what's that?**
We just specified the *virtual address* of this segment. This is the virtual This is the virtual
memory address that the segment will be loaded to. Virtual memory means that memory address that the segment will be loaded to.
addresses in our program do not actually correspond to where the memory is Nowadays, computers use virtual memory, meaning that
physically stored in RAM, with the CPU translating between virtual and physical addresses in our program don't actually correspond to where the memory is
memory addresses. There are many reasons for this: making sure each process has physically stored in RAM (the CPU translates between virtual and physical
memory addresses). There are many reasons for this: making sure each process has
its own memory space, memory protection, etc. You can read more about it its own memory space, memory protection, etc. You can read more about it
elsewhere. elsewhere.
@ -129,11 +131,10 @@ that is a multiple of 4096. Our program needs to be loaded into a memory page,
so its *virtual address* needs to be a multiple of 4096. We're using `0x400000`. so its *virtual address* needs to be a multiple of 4096. We're using `0x400000`.
But wait! Didn't we use `0x400078` for the virtual address? Well, yes but that's But wait! Didn't we use `0x400078` for the virtual address? Well, yes but that's
because the *data in the file* is loaded to address `0x400078`. The actual page because the *data in the file* is loaded to address `0x400078`. The actual page
of memory that the OS will allocate for our code will start at `0x400000`. The of memory that the OS will allocate for our segment will start at `0x400000`. The
reason we need to start `0x78` bytes in is that Linux expects the data *in the reason we need to start `0x78` bytes in is that Linux expects the data in the
file* to be at the same position in the page as when it will be loaded, and it file to be at the same position in the page as when it will be loaded, and it
appears at offset `0x78` in our file. But don't worry if you don't understand appears at offset `0x78` in our file.
that.
## the code ## the code
@ -163,16 +164,17 @@ about it with `man 2 open`.
The first argument, `0x40026d`, is a pointer to some data at the very end of The first argument, `0x40026d`, is a pointer to some data at the very end of
this segment (see further down). Specifically, it holds the bytes this segment (see further down). Specifically, it holds the bytes
`69 6e 30 30 00`, the null-terminated ASCII string `"in00"`. `69 6e 30 30 00`, the null-terminated ASCII string `"in00"`.
This indicates the name of the file. The second argument (`O_RDONLY`, or 0) This indicates the name of the file. The second argument, `0`,
specifies that we will be reading from this file. There is a third argument to specifies that we will (only) be reading from this file. There is a third argument to
this syscall (we'll get to it later), but it's not applicable here so we don't this syscall (we'll get to it later), but it's not applicable here so we don't
set it. set it.
This call gives us back a *file descriptor*, which can be used to read from the This call gives us back a *file descriptor*, a number which we can use to read from the
file, in register `rax`. But we don't actually need to look at what file file, in register `rax`. But we don't actually need to look at what file
descriptor Linux gave us. This is because Linux assigns file descriptor numbers descriptor Linux gave us. This is because Linux assigns file descriptor numbers
sequentially, starting from `0` for standard input, `1` for standard output, `2` sequentially, starting from
for standard error, and then `3, 4, 5, ...` for any files our program opens. So [0 for stdin, 1 for stdout, 2 for stderr](https://en.wikipedia.org/wiki/Standard_streams),
and then 3, 4, 5, ... for any files our program opens. So
this file, the first one our program opens, will have descriptor `3`. this file, the first one our program opens, will have descriptor `3`.
Now we open our output file: Now we open our output file:
@ -194,8 +196,8 @@ file (`O_WRONLY = 0x01`), that we want to create it if it doesn't exist
(`O_TRUNC = 0x200`). Secondly, we are setting the third argument this time. It (`O_TRUNC = 0x200`). Secondly, we are setting the third argument this time. It
specifies the permissions our file is created with (`0o755` means user specifies the permissions our file is created with (`0o755` means user
read/write/execute, group/other read/execute). This is not very important to read/write/execute, group/other read/execute). This is not very important to
the actual execution of the program, so don't worry if you don't know what it the actual execution of the program, so don't worry if you don't know
means. about UNIX permissions.
Now we can start reading from the file. We're going to loop back to this part of Now we can start reading from the file. We're going to loop back to this part of
the code every time we want to read a new hexadecimal number from the input the code every time we want to read a new hexadecimal number from the input
@ -210,7 +212,7 @@ file.
- `0f 05` `syscall` - `0f 05` `syscall`
In C, this is `read(3, 0x40026a, 3)`. Here we call syscall #0, `read`, with In C, this is `read(3, 0x40026a, 3)`. Here we call syscall #0, `read`, with
arguments: three arguments:
- `fd = 3` This is the descriptor number of our input file. - `fd = 3` This is the descriptor number of our input file.
- `buf = 0x40026a` This is the memory address we want Linux to output the data - `buf = 0x40026a` This is the memory address we want Linux to output the data
@ -221,7 +223,6 @@ We're telling Linux to output to `0x40026a`, which is just a part of this
segment (see further down). Normally you would read to a different segment of segment (see further down). Normally you would read to a different segment of
the program from where the code is, but we want this to be as simple as the program from where the code is, but we want this to be as simple as
possible. possible.
The number of bytes *actually read*, taking into account that we might have The number of bytes *actually read*, taking into account that we might have
reached the end of the file, is stored in `rax`. reached the end of the file, is stored in `rax`.
@ -231,8 +232,8 @@ reached the end of the file, is stored in `rax`.
- `0f 8f 50 01 00 00` `jg 0x400250` - `0f 8f 50 01 00 00` `jg 0x400250`
This tells the CPU to jump to a later part of the code (address `0x400250`) if 3 This tells the CPU to jump to a later part of the code (address `0x400250`) if 3
is greater than the number of bytes read in (in other words, if we reached the is greater than the number of bytes we got, in other words, if we reached the
end of the file). Note that we don't specifiy the *address* to jump to, but end of the file. Note that we don't specifiy the *address* to jump to, but
instead the *relative address*, relative to the first byte after the jump instead the *relative address*, relative to the first byte after the jump
instruction (so here we're saying to jump `0x150` bytes forward). There are instruction (so here we're saying to jump `0x150` bytes forward). There are
reasons for this which I won't get into here. reasons for this which I won't get into here.
@ -299,7 +300,7 @@ the one above:
- `48 89 fb` `mov rbx, rdi` - `48 89 fb` `mov rbx, rdi`
- `48 09 d8` `or rax, rbx` - `48 09 d8` `or rax, rbx`
Okay, now we have the byte specified by the two hex digits we read in `rax`. Okay, now `rax` contains the byte specified by the two hex digits we read.
- `48 89 c3` `mov rbx, rax` - `48 89 c3` `mov rbx, rax`
- `48 b8 6c 02 40 00 00 00 00 00` `mov rax, 0x40026c` - `48 b8 6c 02 40 00 00 00 00 00` `mov rax, 0x40026c`
@ -343,8 +344,8 @@ These bytes aren't actually used by our program, and could be set to anything.
These are here because I wasn't sure how long the program would be when I These are here because I wasn't sure how long the program would be when I
started, so I just set the segment size to 512 bytes, which turned out to be started, so I just set the segment size to 512 bytes, which turned out to be
more than enough. I could have cut these out and edited all the addresses to get more than enough. I could have cut these out and edited all the addresses to get
a smaller, cleaner executable, but I'm leaving them in because that's what you a smaller executable, but really there's no point—modern
probably would do if you were doing this for non-instructional purposes. computers can definitely handle 600-byte files.
- `31 c0` `mov rax, 0` - `31 c0` `mov rax, 0`
- `48 89 c7` `mov rdi, rax` - `48 89 c7` `mov rdi, rax`
@ -355,7 +356,7 @@ This is where we conditionally jumped to way back when we determined if we
reached the end of the file. This calls syscall #60, `exit`, with one argument, reached the end of the file. This calls syscall #60, `exit`, with one argument,
0 (exit code 0, indicating we exited successfully). 0 (exit code 0, indicating we exited successfully).
You'd normally close the files first (with syscall #3), to tell Linux you're Normally, you should close files descriptors (with syscall #3), to tell Linux you're
done with them, but we don't need to. It'll automatically close all our open done with them, but we don't need to. It'll automatically close all our open
file descriptors when our program exits. file descriptors when our program exits.
@ -373,8 +374,8 @@ editor and get them translated into a binary file.
There are many ways in which this is a bad program. It will *only* properly There are many ways in which this is a bad program. It will *only* properly
handle lowercase hexadecimal digit pairs, separated by exactly one character, handle lowercase hexadecimal digit pairs, separated by exactly one character,
with a terminating character. What's worse, a bad input file (maybe you with a terminating character. What's worse, a bad input file (maybe someone
accidentally write `3F` instead of `3f`) won't print out a nice error message, accidentally writes `3F` instead of `3f`) won't print out a nice error message,
but instead continue processing as usual, without any indication that anything's but instead continue processing as usual, without any indication that anything's
gone wrong, giving you an unexpected result. gone wrong, giving you an unexpected result.
Also, we only read in data *three bytes at a time*, and output one byte at a Also, we only read in data *three bytes at a time*, and output one byte at a
@ -385,4 +386,5 @@ a while.
But these problems aren't really a big deal. We'll only be running this on But these problems aren't really a big deal. We'll only be running this on
little programs and we'll be sure to check that our input is in the right little programs and we'll be sure to check that our input is in the right
format. And with that, we are ready to move on to the next stage... format. And with that, we are ready to move on to the
[next stage...](../01/README.md).

View file

@ -3,7 +3,7 @@
The code for the compiler for this stage is in the file `in00`. And yes, that's The code for the compiler for this stage is in the file `in00`. And yes, that's
an input to our previous program, `hexcompile`, from stage 00! To compile it, an input to our previous program, `hexcompile`, from stage 00! To compile it,
run `../00/hexcompile` from this directory. You will get a file, `out00`. That run `../00/hexcompile` from this directory. You will get a file, `out00`. That
is the executable for this stage's compiler. Run it (it will read from the file is the executable for this stage's compiler. Run it (it'll read from the file
`in01` I've provided) and you'll get a file `out01`. That executable will print `in01` I've provided) and you'll get a file `out01`. That executable will print
`Hello, world!` when run. Let's take a look at the input we're providing to the `Hello, world!` when run. Let's take a look at the input we're providing to the
stage 01 compiler, `in01`: stage 01 compiler, `in01`:
@ -98,10 +98,9 @@ I decided to put the data before the code this time (it made it a bit
easier to work with), so we start a little bit later than the first byte in our easier to work with), so we start a little bit later than the first byte in our
segment. Second and more importantly, rather than 512 bytes, our segment is segment. Second and more importantly, rather than 512 bytes, our segment is
0x21000 = 135168 bytes long! That's because we're storing a table of all the 0x21000 = 135168 bytes long! That's because we're storing a table of all the
commands our compiler supports. This table has one eight-byte entry for each commands our compiler supports. This table has one 8-byte entry for each
pair of ASCII characters. There are 128 ASCII characters, so that means it's pair of ASCII characters. There are 128 ASCII characters, so that means it's
`128 * 128 * 8 = 131072` bytes long. And there's a little bit of extra room for `128 * 128 * 8 = 131072` bytes long. This large source file means that compiling our
the actual code. Funnily enough, this large source file means that compiling our
stage 01 compiler isn't instantaneous (remember how I said reading 3 bytes at a stage 01 compiler isn't instantaneous (remember how I said reading 3 bytes at a
time would be slow?). On my system, it takes 0.13 seconds to run time would be slow?). On my system, it takes 0.13 seconds to run
`../00/hexcompile`. `../00/hexcompile`.
@ -150,7 +149,7 @@ This opens our output file, just like last time.
- `0f 05` `syscall` - `0f 05` `syscall`
Here we read two bytes from our input file into memory address `0x400083`. Note Here we read two bytes from our input file into memory address `0x400083`. Note
that this corresponds to those two blanks bytes at the start of our error that this corresponds to those two blank bytes at the start of our error
message. message.
- `48 89 c3` `mov rbx, rax` - `48 89 c3` `mov rbx, rax`
@ -326,8 +325,8 @@ very end:
This is at the position for `||`, and it contains an ELF header. One thing you This is at the position for `||`, and it contains an ELF header. One thing you
might notice is that we decided that each entry is 8 bytes long, but this one is might notice is that we decided that each entry is 8 bytes long, but this one is
0x79 = 121 bytes long! It's okay, our code doesn't actually check that we're 0x79 = 121 bytes long! It's okay, our code doesn't check if we use more
using less than 8 bytes of data, but it means that the entries for certain than 8 bytes of data, but it means that the entries for certain
commands, e.g. `}\n` will land right in the middle of the data for the ELF commands, e.g. `}\n` will land right in the middle of the data for the ELF
header. But by a lucky coincidence, all those entries actually land on 0 bytes, header. But by a lucky coincidence, all those entries actually land on 0 bytes,
so they'll just be treated as unrecognized (as they should be). so they'll just be treated as unrecognized (as they should be).
@ -338,7 +337,8 @@ Like our last program, this one will be slow for large files. Again, that isn't
much of a problem for us. Also, if you forget a `;` at the end of a file, it'll much of a problem for us. Also, if you forget a `;` at the end of a file, it'll
loop infinitely rather than printing a nice error message. I could have loop infinitely rather than printing a nice error message. I could have
fixed this, but frankly I've had enough of writing code in hexadecimal. So let's fixed this, but frankly I've had enough of writing code in hexadecimal. So let's
move on to stage 02, now that we have a nicer language on our hands. From now move on to [stage 02](../02/README.md),
now that we have a nicer language on our hands. From now
on, since we have comments, I'm gonna do most of the explaining in the source file on, since we have comments, I'm gonna do most of the explaining in the source file
itself, rather than the README. But there'll still be a bit of stuff there each itself, rather than the README. But there'll still be a bit of stuff there each
time. time.

View file

@ -1,2 +1,7 @@
all: out01 out02 README.html
out01: in01 out01: in01
../01/out00 ../01/out00
out02: out01
./out01
%.html: %.md ../markdown
../markdown $<

110
02/README.md Normal file
View file

@ -0,0 +1,110 @@
# stage 02
The compiler for this stage is in the file `in01`, an input for our previous compiler.
The specifics of how this compiler works are in the comments in that file, but here I'll
give an overview.
Let's take a look at `in02`, an example input file for this compiler:
```
jm
:-co jump to code
::hw
'H
'e
'l
'l
'o
',
'
'w
'o
'r
'l
'd
'!
\n
::he end of hello world
::co start of code
//
// now we'll calculate the length of the hello world string
// by subtracting hw from he.
//
im
--he
BA
im
--hw
nA
+B
DA put length in rdx
// okay now we can write it
im
##1.
JA set rdi to 1 (stdout)
im
--hw
IA set rsi to a pointer to "Hello, world!\n"
im
##1. write
sy
im
##0. exit code 0
JA
im
##3c. exit = syscall 0x3c
sy
```
You can try adding more characters to the hello world message, and it'll just work;
the length of the text is computed automatically!
This time, commands are separated by newlines instead of semicolons.
Each line begins with a 2-character command identifier. There are some special identifiers though:
- `::` marks a *label*
- `--` outputs a label's (absolute) address
- `:-` outputs a label's relative address
- `##` outputs a number
All other commands work like they did in the previous compiler—if you scroll down in the
`in01` source file, you'll see the full command table.
## labels
Labels are the most important new feature of this language.
## two passes?
## other features
Now instead of writing out each of the 8 bytes making up a number,
we can just write it in hexadecimal (e.g. `##3c.` for `3c 00 00 00 00 00 00 00`),
and the compiler will automatically
extend it to 8 bytes.
This is especially nice because we don't need to write numbers backwards
for little-endianness anymore!
Numbers cannot appear at the end of a line (this was
to make the compiler simpler to write), so I'm adding a `.` at the end of
each one to avoid making that mistake.
Anything after a command is treated as a comment;
additionally `//` can be used for comments on their own lines.
I decided to implement them as simply as possible:
I just added the command `//` to the command table, which outputs the byte `0x90`—this
means "do nothing" (`nop`) in x86-64.
Note that this means that the following code will not work as expected:
```
im
// load the value 0x333 into rax
##333.
```
since `0x90` gets inserted between the "load immediate" instruction code, and the immediate.
## limitations
Many of the limitations of our previous compilers apply to this one. Also,
if you use a label without defining it, it uses address 0, rather than outputting
an error message. This could be fixed: if the value in the label table is 0, and if we are
on the second pass, output an error message. This compiler was already tedious enough
to implement, though!
But thanks to labels, for future compilers at least we won't have to calculate
any jump offsets manually.

View file

@ -1,6 +1,7 @@
all: markdown README.html all: markdown README.html
$(MAKE) -C 00 $(MAKE) -C 00
$(MAKE) -C 01 $(MAKE) -C 01
$(MAKE) -C 02
markdown: markdown.c markdown: markdown.c
$(CC) -O2 -o markdown -Wall -Wconversion -Wshadow -std=c89 markdown.c $(CC) -O2 -o markdown -Wall -Wconversion -Wshadow -std=c89 markdown.c
README.html: markdown README.md README.html: markdown README.md

View file

@ -3,9 +3,9 @@
Compilers nowadays are written in languages like C, which themselves need to be Compilers nowadays are written in languages like C, which themselves need to be
compiled. But then, you need a C compiler to compile your C compiler! Of course, compiled. But then, you need a C compiler to compile your C compiler! Of course,
the very first C compiler was not written in C (because how would it be the very first C compiler was not written in C (because how would it be
compiled?). Instead, it was slowly built up, starting from a very basic compiled?). Instead, it was built up over time, starting from a very basic
assembler, eventually reacing a full-scale compiler. This process is known as assembler, eventually reaching a full-scale compiler.
bootstrapping. In this repository, we'll explore how that's done. Each directory In this repository, we'll explore how that's done. Each directory
represents a new "stage" in the process. The first one, `00`, is a hand-written represents a new "stage" in the process. The first one, `00`, is a hand-written
executable, and the last one will be a C compiler. Each directory has its own executable, and the last one will be a C compiler. Each directory has its own
README explaining what's going on. README explaining what's going on.
@ -13,15 +13,21 @@ README explaining what's going on.
You can run `bootstrap.sh` to run through and test every stage. You can run `bootstrap.sh` to run through and test every stage.
To get HTML versions of all README pages, run `make`. To get HTML versions of all README pages, run `make`.
## the basics Note that the executables produced in this series will only run on
64-bit Linux, because each OS/architecture combination would need its own separate
executable.
The README for the first stage is [here](00/README.md).
## prerequisite knowledge
In this series, I want to explain *everything* that's going on. I'm going to In this series, I want to explain *everything* that's going on. I'm going to
need to assume some passing knowledge, so here's a quick overview of what you'll need to assume some passing knowledge, so here's a quick overview of what you'll
want to know before starting. I can't explain everything so you may need to do want to know before starting.
your own research. You don't need to understand each of these in full, just get You don't need to understand everything about each of these, just get
a general idea at least: a general idea:
- what an operating system is - what a system call is
- what memory is - what memory is
- what a programming language is - what a programming language is
- what a compiler is - what a compiler is
@ -35,7 +41,7 @@ decimal.
- bits, bytes, kilobytes, etc. - bits, bytes, kilobytes, etc.
- bitwise operations (not, or, and, xor, left shift, right shift) - bitwise operations (not, or, and, xor, left shift, right shift)
- 2's complement - 2's complement
- null-terminated strings - ASCII, null-terminated strings
- how pointers work - how pointers work
- how floating-point numbers work - how floating-point numbers work
- maybe some basic Intel-style x86-64 assembly (you can probably pick it up on - maybe some basic Intel-style x86-64 assembly (you can probably pick it up on
@ -73,7 +79,7 @@ with itself, we'll get the same executable either way.
In 1984, Ken Thompson wrote the well-known article In 1984, Ken Thompson wrote the well-known article
[Reflections on Trusting Trust](http://users.ece.cmu.edu/~ganger/712.fall02/papers/p761-thompson.pdf). [Reflections on Trusting Trust](http://users.ece.cmu.edu/~ganger/712.fall02/papers/p761-thompson.pdf).
This is one of the things that inspired me to start this project. To summarize This is one of the inspirations for this project. To summarize
the article: it is possible to create a malicious C compiler which will the article: it is possible to create a malicious C compiler which will
replicate its own malicious functionalities (e.g. detecting password-checking replicate its own malicious functionalities (e.g. detecting password-checking
routines to make them also accept another password the attacker knows) when used routines to make them also accept another password the attacker knows) when used

View file

@ -57,7 +57,8 @@ static void output_md_text(FILE *out, int *flags, int line_number, const char *t
break; break;
case '[': { case '[': {
/* link */ /* link */
const char *label, *url, *label_end, *url_end; char url2[256] = {0};
const char *label, *url, *label_end, *url_end, *dot;
int n_label, n_url; int n_label, n_url;
label = p+1; label = p+1;
@ -79,8 +80,16 @@ static void output_md_text(FILE *out, int *flags, int line_number, const char *t
n_label = (int)(label_end - label); n_label = (int)(label_end - label);
n_url = (int)(url_end - url); n_url = (int)(url_end - url);
fprintf(out, "<a href=\"%.*s\" target=\"_blank\">%.*s</a>", if (n_url > sizeof url2-8)
n_url, url, n_label, label); n_url = sizeof url2-8;
sprintf(url2, "%.*s", n_url, url);
dot = strrchr(url2, '.');
if (dot && strcmp(dot, ".md") == 0) {
/* replace links to md files with links to html files */
strcpy(dot, ".html");
}
fprintf(out, "<a href=\"%s\" target=\"_blank\">%.*s</a>",
url2, n_label, label);
p = url_end; p = url_end;
} break; } break;
case '-': case '-':
@ -150,7 +159,7 @@ int main(int argc, char **argv) {
"<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\n" "<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\n"
"<meta charset=\"utf-8\">\n" "<meta charset=\"utf-8\">\n"
"<style>\n" "<style>\n"
"body { font-family: sans-serif; }\n" "body { font-family: serif; }\n"
"</style>\n" "</style>\n"
"<title>%s</title>\n" "<title>%s</title>\n"
"</head>\n" "</head>\n"