Introduction
BBC BASIC programs are internally stored in a tokenised format. Certain components of a line of program code, such as keywords, are replaced with a single-byte token. For example, the keyword ENDPROC is represented by the single byte value &E1. This saves space (ENDPROC only takes up 1 byte instead of 7) and allows the interpreter to recognise keywords more efficiently.
Tokeniser Process
The tokeniser reads the input string from left to right and substitutes recognised keywords with their tokens (as listed in the Keyword Tokens section below).
Keyword Flags
The tokeniser maintains a state which controls the way that it tokenises statements. Each keyword has a number of flags which can be used to control the state of the tokeniser.
- Conditional flag. If this is set then the keyword is not tokenised if it is followed by an alphanumeric character. This would allow, for example, TIMER to be used as a variable name as the TIME part of it would not be tokenised.
- Middle flag. If encountered the tokeniser is put into middle of statement mode after the keyword has been tokensied.
- Start flag. If encountered the tokeniser is put into start of statement mode after the keyword has been tokenised. Being at the start of a statement affects the way pseudo-variables and star commands are tokenised.
- FN/PROC flag. If this is set then the name immediately following the token is not tokenised. For example, this would allow you to declare a PROC_PRINT as the PRINT component would not be tokenised.
- Line number flag. If encountered then the following statements are interpreted as line numbers. This mode is disabled when a character other than a number, comma or space is encountered.
- REM flag. If encountered then the remainder of the line is left untokenised.
- Pseudo-variable flag. If this is set then &40 is added to the token value if the keyword was found at the start of the line. This gives pseudo-variables (such as TIME) two different tokens; one when used as a statement, the other when used as a function.
Other Symbols
The following symbols also affect tokenising:
- & - skips tokenising the following hex number. For example, the DEF in &DEF is not tokenised.
- " - skips tokenising the following string constant.
- : - puts the tokeniser into start of statement mode.
- * - disables the tokeniser if encountered at the start of the statement.
Line Numbers
Certain keywords are followed by a line number. These line numbers are tokenised to speed up decoding. However, due to some constraints, such as each byte in the tokenised line number needing to below &80 (to prevent it looking like another token) and above &20 (to prevent it from being &0D, the program line terminator) converting a line number into its tokenised form is a relatively complex procedure.
Each tokenised line number is four bytes in size, and is made up like this:
- Byte 0 - &8D, the line number token.
- Byte 2 - bits 0-5 are bits 0-5 of the LSB of the line number; bit 6 is 1 and bit 7 is 0.
- Byte 3 - bits 0-5 are bits 0-5 of the MSB of the line number; bit 6 is 1 and bit 7 is 0.
Byte 1 is more complicated and is formed as shown:
Byte 1 Bit | Value |
---|---|
0 | 0 |
1 | 0 |
2 | bit 6 of MSB (inverted) |
3 | bit 7 of MSB |
4 | bit 6 of LSB (inverted) |
5 | bit 7 of LSB |
6 | 1 |
7 | 0 |
In a C-like programming language the conversion may be carried out using the following code:
ushort line = 1234; // Line number to convert. byte byte0 = 0x8D; byte byte1 = (((line & 0x00C0) >> 2) | ((line & 0xC000) >> 12)) ^ 0x54; byte byte2 = ((line >> 0) & 0x3F) | 0x40; byte byte3 = ((line >> 8) & 0x3F) | 0x40;
The process can be reversed to retrieve the original line number.
Keyword Tokens
Token | Keyword | Flags |
---|---|---|
80 | AND | -------- |
81 | DIV | -------- |
82 | EOR | -------- |
83 | MOD | -------- |
84 | OR | -------- |
85 | ERROR | -----S-- |
86 | LINE | -------- |
87 | OFF | -------- |
88 | STEP | -------- |
89 | SPC | -------- |
8A | TAB( | -------- |
8B | ELSE | ---L-S-- |
8C | THEN | ---L-S-- |
8D | line no. | -------- |
8E | OPENIN | -------- |
8F | PTR | -P----MC |
90 | PAGE | -P----MC |
91 | TIME | -P----MC |
92 | LOMEM | -P----MC |
93 | HIMEM | -P----MC |
94 | ABS | -------- |
95 | ACS | -------- |
96 | ADVAL | -------- |
97 | ASC | -------- |
98 | ASN | -------- |
99 | ATN | -------- |
9A | BGET | -------C |
9B | COS | -------- |
9C | COUNT | -------C |
9D | DEG | -------- |
9E | ERL | -------C |
9F | ERR | -------C |
Token | Keyword | Flags |
---|---|---|
A0 | EVAL | -------- |
A1 | EXP | -------- |
A2 | EXT | -------C |
A3 | FALSE | -------C |
A4 | FN | ----F--- |
A5 | GET | -------- |
A6 | INKEY | -------- |
A7 | INSTR | -------- |
A8 | INT | -------- |
A9 | LEN | -------- |
AA | LN | -------- |
AB | LOG | -------- |
AC | NOT | -------- |
AD | OPENUP | -------- |
AE | OPENOUT | -------- |
AF | PI | -------C |
B0 | POINT( | -------- |
B1 | POS | -------C |
B2 | RAD | -------- |
B3 | RND | -------C |
B4 | SGN | -------- |
B5 | SIN | -------- |
B6 | SQR | -------- |
B7 | TAN | -------- |
B8 | TO | -------- |
B9 | TRUE | -------C |
BA | USR | -------- |
BB | VAL | -------- |
BC | VPOS | -------C |
BD | CHR$ | -------- |
BE | GET$ | -------- |
BF | INKEY$ | -------- |
Token | Keyword | Flags |
---|---|---|
C0 | LEFT$( | -------- |
C1 | MID$( | -------- |
C2 | RIGHT$( | -------- |
C3 | STR$ | -------- |
C4 | STRING$( | -------- |
C5 | EOF | -------C |
C6 | AUTO | ---L---- |
C7 | DELETE | ---L---- |
C8 | LOAD | ------M- |
C9 | LIST | ---L---- |
CA | NEW | -------C |
CB | OLD | -------C |
CC | RENUMBER | ---L---- |
CD | SAVE | ------M- |
CE | PUT | -------- |
CF | PTR | -------- |
D0 | PAGE | -------- |
D1 | TIME | -------- |
D2 | LOMEM | -------- |
D3 | HIMEM | -------- |
D4 | SOUND | ------M- |
D5 | BPUT | ------MC |
D6 | CALL | ------M- |
D7 | CHAIN | ------M- |
D8 | CLEAR | -------C |
D9 | CLOSE | ------MC |
DA | CLG | -------C |
DB | CLS | -------C |
DC | DATA | --R----- |
DD | DEF | -------- |
DE | DIM | ------M- |
DF | DRAW | ------M- |
Token | Keyword | Flags |
---|---|---|
E0 | END | -------C |
E1 | ENDPROC | -------C |
E2 | ENVELOPE | ------M- |
E3 | FOR | ------M- |
E4 | GOSUB | ---L--M- |
E5 | GOTO | ---L--M- |
E6 | GCOL | ------M- |
E7 | IF | ------M- |
E8 | INPUT | ------M- |
E9 | LET | -----S-- |
EA | LOCAL | ------M- |
EB | MODE | ------M- |
EC | MOVE | ------M- |
ED | NEXT | ------M- |
EE | ON | ------M- |
EF | VDU | ------M- |
F0 | PLOT | ------M- |
F1 | ------M- | |
F2 | PROC | ----F-M- |
F3 | READ | ------M- |
F4 | REM | --R----- |
F5 | REPEAT | -------- |
F6 | REPORT | -------C |
F7 | RESTORE | ---L--M- |
F8 | RETURN | -------C |
F9 | RUN | -------C |
FA | STOP | -------C |
FB | COLOUR | ------M- |
FC | TRACE | ---L--M- |
FD | UNTIL | ------M- |
FE | WIDTH | ------M- |
FF | OSCLI | ------M- |