Blocks 940 to 987 of StrongForth's default block file forth.blk contain a collection of samples and utilities. Some of them demonstrate specific features of StrongForth, while others point out certain aspects of data typing and may help overcoming difficult situations.
Applying UPPER or LOWER to characters converts letters to upper or lower case, respectively. Non-letter characters remain unchanged. To be able to apply the same conversions on character strings, overloaded definitions of UPPER and LOWER are provided:
940 LOAD OK CHAR @ LOWER . @ OK CHAR A LOWER . a OK PARSE-WORD STRONGFORTH OVER OVER LOWER TYPE strongforth OK
The ANS Forth definitions CHAR and [CHAR] are very handy when it comes to producing single character literals. However, these words can only produce printable ASCII characters. Whenever ASCII control characters are required, it might be useful to have a set of corresponding constants available.
CTRL and [CTRL] are variations of CHAR and [CHAR], respectively. They accept only letters in lower or upper case and return the corresponding control character:
941 943 THRU OK CTRL G . beep! OK : BELL [CTRL] G . ; OK BELL beep! OK
All control characters that can be produced by using CTRL are defined as constants in block 942. The remaining 7 control characters (including <NUL> and <DEL>) are defined using explicit type casts, e. g.
27 CAST CHARACTER CONSTANT <ESC>
Sometimes it is necessary to temporarily switch the number conversion radix BASE to a different value, for example when creating a single hexadecimal number or displaying one number in binary, while all other number input and output is done in decimal. This is rather clumsy in Forth:
BASE @ >R HEX 03FF R> BASE ! BASE @ >R 2 BASE ! . R> BASE !
In these cases, a word is useful that saves BASE, assigns it a new value, executes an arbitrary word and then immediately restores BASE to the original value. Here is how such a word can be used:
944 LOAD OK H# 03FF . 1023 OK 33333 B# . 1000001000110101 OK
In block 944, five of these words are defined:
This is a good opportunity to show how to use defining words in StrongForth. :BASE actually looks almost the same as it would in ANS Forth, except for the stack diagram after DOES>. This stack diagram is mandatory. It tells the compiler that the runtime code of a new definition starts with an address of data type CONST -> UNSIGNED on the data stack. This is the address of the definition's data field, which contains the temporary number conversion radix as a constant. To make sure that the temporary value of BASE is stored in the constant data space, CONST-SPACE must be executed before creating any definitions with :BASE. Of course, this is not necessary if CONST, is used instead of , within :BASE. Remember that a definition's data field is always located in the constant data space.
ANS Forth specifies several words that parse the input stream in order to obtain a character string to be compiled or otherwise processed. However, it is sometimes desired to construct such a string at runtime instead of reading it from the input stream. Out of these so-called parsing words, CREATE is the one that is most often desired to take a character string as an input parameter. For these cases, block 945 defines the word HEADER. HEADER is actually a non-parsing version of (CREATE), because (CREATE) is more versatile than CREATE. With HEADER, non-parsing versions of most defining words, like CREATE, CODE, CONSTANT, VARIABLE, VALUE and DEFER can easily by defined. Here's an example:
WORDS HEADER HEADER ( CONST CDATA -> CHARACTER UNSIGNED -- ) OK : CODE-NOPARSE ( CDATA -> CHARACTER UNSIGNED -- ) ?EXECUTE CONST-HERE ROT ROT HEADER CODE-HERE CONST, END-CODE ; OK
The ANS Forth word LOCALS| is unintuitive in the sense that the names of the locals apparently have to be provided in the wrong order. For example, in
: TEST1 ( FLAG CHARACTER UNSIGNED -- ) LOCALS| U C F | U . C . SPACE F . ; OK TRUE CHAR A 17 TEST1 17 A TRUE OK
the first input parameter gets assigned to the last local and vice versa. The reason is that LOCALS| processes the names in it's list one by one, assigning the top of the stack to the first name and so on. LOCALS( ... ) is an alternative to LOCALS| ... | that assignes the items on the stack to the locals in the correct order:
: TEST2 ( FLAG CHARACTER UNSIGNED -- ) LOCALS( F C U ) F . C . SPACE U . ; OK TRUE CHAR A 17 TEST2 TRUE A 17 OK
LOCALS| uses a recursive algorithm instead of an iterative algorithm, which allows to elegantly reverse the order of defining the locals.
PERMUTATION calculates a permutation of an array of single cells with a given length. If n is the number of single cells, n! different permutations exist. The last input parameter of PERMUTATION specifies the index of the generated permutation. By iterating n from 0 to n!-1, all possible permutations can be produced:
947 LOAD 948 LOAD OK DATA-SPACE 1 VARIABLE M 2 , 3 , 4 , OK : RESET ( DATA -> SINGLE UNSIGNED -- ) 0 DO I 1+ CAST SINGLE OVER I + ! LOOP DROP ; OK : DISPLAY ( DATA -> SINGLE UNSIGNED -- ) 0 DO DUP I + @ . LOOP DROP CR ; OK : PERMUTATION-ALL ( DATA -> SINGLE UNSIGNED -- ) DUP FAK 0 DO OVER OVER RESET OVER OVER I PERMUTATION DISPLAY LOOP DROP DROP ; OK M 4 PERMUTATION-ALL 1 2 3 4 1 2 4 3 1 3 2 4 1 3 4 2 1 4 3 2 1 4 2 3 2 1 3 4 2 1 4 3 2 3 1 4 2 3 4 1 2 4 3 1 2 4 1 3 3 2 1 4 3 2 4 1 3 1 2 4 3 1 4 2 3 4 1 2 3 4 2 1 4 2 3 1 4 2 1 3 4 3 2 1 4 3 1 2 4 1 3 2 4 1 2 3 OK
MSWAP and FAK are words used by PERMUTATION. Using locals in MSWAP is of course not necessary, but it makes things more clear.
It is easy to see that overloaded definitions of MSWAP and PERMUTATION for arrays of double cells or of characters can be produced without having to change anything but the stack diagrams. The consistent use of overloaded definitions for words being applied to different operands makes this possible.
Operator overloading allows defining multiple words with the same name but different stack diagrams. In ANS Forth, redefining a word means that the old versions is hidden to the interpreter. In StrongForth, executing and compiling the old versions is still possible, provided that the stack diagrams of all overloaded versions can be clearly distinguished. But since the interpreter only considers the name and the input parameters when searching the dictionary for a match, it obviously cannot distinguish overloaded versions that differ only in the output parameters. In those cases, the interpreter never finds the older version.
A typical example is the word BL:
BL ( -- CHARACTER )
Once the 8086 assembler is loaded, an overloaded version of BL shows up in the dictionary, because BL is also a register name:
BL ( -- MODE )
Since both versions have no input parameters, INTERPRET has no means to distinguish them and will always find the latest one:
BL .S . MODE 29556760 OK
With )INTERPRET, the inaccessible version of BL can be made available. )INTERPRET allows specifying the complete stack diagram, including the output parameters, in the dictionary search. Only words whose stack diagrams exactly match the given one are actually found. )INTERPRET is a state-smart word that works both in interpretation and in compilation state:
( -- CHARACTER )INTERPRET BL .S . CHARACTER OK : TEST [CHAR] 0 ( -- CHARACTER )INTERPRET BL DO I . LOOP ; OK TEST !"#$%&'()*+,-./ OK ( -- INTEGER )INTERPRET BL ( -- INTEGER )INTERPRET BL ? undefined word
The definition of )INTERPRET uses SEARCH-ALL with a new additional search criterion that is a combination of the additional search criteria implemented by MATCH and IDENTITY.
In most cases, EVALUATE is executed by an immediate word in compilation state. Any stack effects of the words EVALUATEd are directly applied to the compiler data type heap. A typical example is the application of EVALUATE in POSTPONE. The most important aspect of this use case is the fact that the context of the string being evaluated is not the word EVALUATE is being compiled into, but the environment that actually executes the word containing EVALUATE. With respect to POSTPONE, the context of the string to be evaluated is the compile time of the word executing POSTPONE. But this use case is not restricted to compilation. In the following example, the context of the evaluated string is the interpreter:
: TEST ( -- ) " SPACES" EVALUATE ; OK 5 TEST OK
TEST does not have a stack diagram, but since SPACES expects an item of data type INTEGER on the data stack, the following usage fails:
TRUE TEST SPACES ? undefined word FLAG
Now, what can we do if the context of the string to be evaluated shall be the word in which EVALUATE is embedded in? In ANS Forth, this is no problem at all, but StrongForth's data type system does not like it at all:
: TEST ( -- ) 5 " SPACES" EVALUATE ; 5 " SPACES" EVALUATE ; ? data types not congruent UNSIGNED
We have to face the problem that the stack effect of the character string being evaluated at runtime is not known to the compiler in general. What is required is an extended version of EVALUATE which informs the compiler about the expected stack effect of the evaluated string. Furthermore, the extended version has to validate at runtime that the evaluated string really has the stack effect the compiler assumed it would have. The previous example simply has to be rewritten as follows:
: TEST ( -- ) 5 " SPACES" ( UNSIGNED -- )EVALUATE ; OK TEST OK
This is what )EVALUATE does: First of all, it creates a noname definition with the given stack diagram and the same execution token as NOOP. This means, the definition has no semantics. Next, )EVALUATE compiles the runtime code, which consists of the noname definition as a literal of data type DEFINITION and the word (EVALUATE). (EVALUATE) expects the character string and the noname definition as input parameters on the stack. Finally, )EVALUATE compiles the noname definition in order to apply its stack effect to the compiler data type heap. Because the noname definition has no semantics, no virtual code is being compiled:
SEE TEST : TEST ( -- ) 5 " SPACES" 477652010 (EVALUATE) ; OK
Now, what does (EVALUATE) do? As for EVALUATE, there are two overloaded versions for strings in the DATA and CONST memory areas:
(EVALUATE) ( CDATA -> CHARACTER UNSIGNED DEFINITION -- ) (EVALUATE) ( CCONST -> CHARACTER UNSIGNED DEFINITION -- )
The definitions of these two versions look exactly the same, because the only difference is the overloaded version of EVALUATE they execute. Their semantics is pretty simple. They put the input parameters of the noname definition onto the current data type heap, evaluate the character string with EVALUATE and then check whether the resulting contents of the data type heap match the output parameters of the noname definition.
Character strings are often stored in the CONST memory area. Typical examples are character strings that are compiled by ", or character strings located in the data field of a definition. Unfortunately, it is not allowed to modify the contents of the CONST memory area at runtime of an application, because this memory area might be a read-only memory. Only character strings in the DATA memory area may be modified. <TRANSIENT expects a character string in the CONST memory area, and returns a copy of it in the DATA memory area. It actually allocates a transient area in the local name space. After string processing is done, TRANSIENT> deallocates the transient area. Here's a simple example:
: FIND-INDEX ( UNSIGNED -- DEFINITION SIGNED ) [CHAR] 0 SWAP + >R " DUMMY_" <TRANSIENT OVER OVER + 1- R> SWAP ! OVER SWAP 0 CODE-FIELD SEARCH-ALL ROT TRANSIENT> ; OK : DUMMY1 ; OK : DUMMY2 ; OK : DUMMY3 ; OK 1 FIND-INDEX . . -1 DUMMY1 ( -- ) OK 2 FIND-INDEX . . -1 DUMMY2 ( -- ) OK 3 FIND-INDEX . . -1 DUMMY3 ( -- ) OK
FIND-INDEX searches the dictionary for definitions with the name DUMMYn, where n is a numerical index between 0 and 9. Note that TRANSIENT> just expects the address of the string's first character on the data stack, because this information is sufficient in order to deallocate the string. <TRANSIENT and TRANSIENT> may be nested.
HIGH and LOW return the higher and the lower half of a single-cell or a double-cell item, respectively. If the size of a single cell is 16 bits, half a cell is a byte:
HEX A71D DUP HIGH . A7 OK LOW . 1D OK
If HIGH and LOW are applied to double-cell items, the result is a single-cell item:
HEX 700CB57E. DUP HIGH . 700C OK LOW . B57E OK
Both words are overloaded for single-cell and double-cell items. Note that the output parameters of the single-cell versions have the same data types as the input parameters, whereas the double-cell versions alway return items of data type SINGLE:
HIGH ( SINGLE -- 1ST ) HIGH ( DOUBLE -- SINGLE ) LOW ( SINGLE -- 1ST ) LOW ( DOUBLE -- SINGLE )
As an alternative, HIGH and LOW can also be implemented as colon definitions. However, it's clear that machine code implementations are faster.
The two sample words perform the calculation of the square root of an unsigned single or double number, respectively. Both words are almost identical, thus showing how easy it is in StrongForth to adapt an existing word to a new data type. SQRT is an overloaded word being applied to unsigned single and to unsigned double numbers:
954 LOAD OK 1000 SQRT . 31 OK 200000000. SQRT . 14142 OK
SQRT is an implemention of an iterative algorithm:
x(n+1) = ( x(n) + a/x(n) ) / 2
The iteration starts with x(0) = 255 for single numbers and x(0) = 65535 for double numbers. It stops when the difference between x(n+1) and x(n) is less than 2.
Because of a numeric overflow during the calculation, the results of SQRT are not correct if the radicand is equal to MAX-U (65535) or MAX-UD (4294967295).
COUNT-BITS returns the number of 1 bits in a single-cell or double-cell item. The version for double-cell items is based on the version for single-cell items. The implementation is quite tricky. Instead of looping through all 16 bits, counting is partly parallized by first counting 1 the bits in 8 fields of 2 bits each, and then aggregating these 8 values to 4, to 2 and finally to one bit count. Sequentially applying arithmetical and logical operations requires multiple type casts, because StrongForth's arithmetical operations are defined on numeric data types only, whereas logical operations are defined on logical data types. This is one example where the type system seems to get in the way of the implementation. But on the other hand, type casts do not cause any runtime overhead, and their presence is a pretty good indicator for tricky programming.
955 LIST 0 \ COUNT-BITS 1 2 HEX 3 4 : COUNT-BITS ( SINGLE -- UNSIGNED ) 5 CAST UNSIGNED 5555 OVER CAST LOGICAL RSHIFT AND - 6 CAST LOGICAL 3333 OVER AND 3333 ROT 2 RSHIFT AND + 7 CAST LOGICAL 0F0F OVER AND 0F0F ROT 4 RSHIFT AND + 8 CAST LOGICAL 00FF OVER AND 00FF ROT 8 RSHIFT AND + ; 9 10 DECIMAL 11 12 : COUNT-BITS ( DOUBLE -- UNSIGNED ) 13 SPLIT COUNT-BITS SWAP COUNT-BITS + ; 14 15 OK HEX F7B1 COUNT-BITS DECIMAL . 11 OK H# F7B1 COUNT-BITS . 11 OK H# B20CDA3E. COUNT-BITS . 16 OK
Whenever a Forth programmer is in doubt about the stack diagram and/or the semantics of a word, the glossary is the first place to look for help. In StrongForth, an extended version of WORDS displays the stack diagrams of all words with a given name in the word list that is on top of the search order:
ALSO FORTH WORDS /STRING /STRING ( CDATA -> CHARACTER UNSIGNED -- 1ST 3RD ) /STRING ( CDATA -> CHARACTER UNSIGNED INTEGER -- 1ST 3RD ) OK
In addition to this feature, the contents of the glossary can be displayed with HELP. HELP is a utility whose definitions are stored in blocks 956 to 958. It shows the glossary entries of all words with a given name in all word sets. Here'a an example:
HELP /STRING /STRING ( CDATA -> CHARACTER UNSIGNED INTEGER -- 1ST 3RD ) Adjust the character string at CDATA -> CHARACTER with length UNSIGNED by INTEGER characters. The resulting character string, specified by 1ST 3RD, begins at CDATA -> CHARACTER plus INTEGER characters and is UNSIGNED minus INTEGER characters long. /STRING ( CDATA -> CHARACTER UNSIGNED -- 1ST 3RD ) Adjust the character string at CDATA -> CHARACTER with length UNSIGNED by one character. The resulting character string, specified by 1ST 3RD, begins at CDATA -> CHARACTER plus one character and is UNSIGNED minus one characters long. OK
Creating qualified tokens that can be passed to EXECUTE and CATCH is pretty unconvenient in StrongForth. After creating a data type for the qualified token, you have to find a suitable overloaded version of the desired word and then cast the (unqualified) token to a qualified token:
( UNSIGNED 1ST -- 1ST )PROCREATES TOKEN(U1--1) OK 4 0 DT TOKEN(U1--1) ?TOKEN / CAST TOKEN(U1--1) CATCH . DROP -10 OK
In ANS Forth, creating the execution token is much simpler:
... ' / CATCH ... \ ANS forth code
Defining a data type for the qualified token can't be omitted, because using qualified tokens is necessary for retaining the consistency of the data type system. However, the calculation of the qualified token gives an opportunity for increasing the convenience of using EXECUTE and CATCH. Block 960 defines the word 'QTOKEN, which is similar to 'TOKEN, but creates a qualified token instead of an item of data type TOKEN. Using 'QTOKEN, the above example can be simplified:
960 LOAD OK 4 0 'QTOKEN TOKEN(U1--1) / CATCH . DROP -10 OK
'QTOKEN parses the name of the data type of the qualified token and the name of the word whose qualified token is to be calculated. It is type-save, because it uses ?TOKEN before casting the token to a qualified token. However, it works only in interpretation state. During compilation, ['QTOKEN] has to be used instead. This word is also defined in block 960. Instead of
: EXAMPLE ... [ DT TOKEN(U1--1) ?TOKEN / CAST TOKEN(U1--1) ] LITERAL CATCH ...
you can simply write
: EXAMPLE ... ['QTOKEN] TOKEN(U1--1) / CATCH ...
StrongForth provides quite a number of different data types for addresses in order to support the x86 architecture. Addresses of data types DATA, CONST and CODE (and CDATA, CCONST and CCODE) refer to specific memory areas called segments. Since the size of a segment does not exceed 65536 address units, an address that is tied to a specific segment fits into one cell. Such an address may be called a near address. Addresses outside of the three predefined segments can only be specified as combinations of the segment and the offset. Both segment and offset are 16 bits wide, and the combination of both is called a far address. It fits into a double-cell item of data type FAR-ADDRESS or CFAR-ADDRESS.
Converting a far address to a near address is unsave, because the information about the segment gets lost. But it is sometimes useful to convert a near address into far a address. The segment of a near address is a constant determined by the data type DATA, CONST or CODE. Data type ADDRESS, on the other hand, is not tied to a segment. Blocks 961 to 963 provide code definitions for the required conversion words:
FAR ( DATA -- FAR-ADDRESS ) FAR ( DATA -> SINGLE -- FAR-ADDRESS -> 2ND ) FAR ( DATA -> DOUBLE -- FAR-ADDRESS -> 2ND ) FAR ( CDATA -- CFAR-ADDRESS ) FAR ( CDATA -> SINGLE -- CFAR-ADDRESS -> 2ND ) FAR ( CONST -- FAR-ADDRESS ) FAR ( CONST -> SINGLE -- FAR-ADDRESS -> 2ND ) FAR ( CONST -> DOUBLE -- FAR-ADDRESS -> 2ND ) FAR ( CCONST -- CFAR-ADDRESS ) FAR ( CCONST -> SINGLE -- CFAR-ADDRESS -> 2ND ) FAR ( CODE -- FAR-ADDRESS ) FAR ( CODE -> SINGLE -- FAR-ADDRESS -> 2ND ) FAR ( CODE -> DOUBLE -- FAR-ADDRESS -> 2ND ) FAR ( CCODE -- CFAR-ADDRESS ) FAR ( CCODE -> SINGLE -- CFAR-ADDRESS -> 2ND )
Each predefined segment needs 5 conversion words to distinguish addresses of single-cell and double-cell items from addresses of character size items, and to retain the tail of compound data types. Here's a small example:
BASE .S DATA -> UNSIGNED OK FAR .S FAR-ADDRESS -> UNSIGNED OK @ . 10 OK
Far addresses to other segments can only be created with explicit type casts. When casting a double number to a far address, the higher part of the double number becomes the segment, and the lower part becomes the offset:
HEX B0000000. CAST FAR-ADDRESS CONSTANT VIDEO OK
StrongForth's basic implementations of KEY and KEY? are simplified with respect to the ANS Forth specification in the sense that they do not handle non-character keyboard events in the correct way. Non-character keyboard events, like pressing a function key F1 to F12, are represented by sequences of two character codes with the first one being the null character <NUL>. According to ANS Forth, non-character keyboard events shall be ignored by KEY and KEY?.
Full-featured versions of KEY? and KEY that comply with the specification are provided in blocks 964 and 965, respectively. The implementation is very similar to the example in appendix 10 (A.10.6.2.1305) of the ANS Forth specification.
In ANS Forth, arrays are typically defined with CREATE:
CREATE MYARRAY 10 CELLS ALLOT \ ANS Forth
Executing MYARRAY leaves the address of MYFIELD's data field on the stack, which is the first cell of the array. By adding zero-based indexes multiplied with the cell size to this address, you can calculate the addresses of the other cells. Since a word's data field is supposed to be in an area of memory with read and write access, the array elements can be arbitrarily accessed.
In StrongForth, a word's data field is located in the CONST memory area. In an embedded system, this memory area might be read-only at runtime. As a result, you cannot define an array with CREATE if you intend to write to it at runtime. Only arrays of constants may be defined in this way. Note that this important restriction is not a consequence of StrongForth's data type system.
StrongForth's only defining word that allocates memory in the DATA memory area is VARIABLE. With VARIABLE, you can define an array with read/write access:
NULL LOGICAL VARIABLE MYARRAY DATA-SPACE 9 CELLS ALLOT
Defining an array in this way looks rather clumsy. Only the first element is initialized, and you have to allocate one element less than the actual size of the array, because VARIABLE already allocated the first one. Furthermore, you must not forget to select the data memory space before allocating the additional elements. A more readable definition looks like this:
NULL LOGICAL 10 ARRAY MYARRAY
ARRAY is a defining word that creates an array of the given size (10) in the data memory space, whose elements are automatically initialized to the given value (NULL LOGICAL). Its definition is contained in blocks 966 and 967:
966 967 THRU OK NULL LOGICAL 10 ARRAY MYARRAY OK MYARRAY .S DATA -> LOGICAL OK 4 + @ .S . LOGICAL 0 OK 13 BIT MYARRAY 4 + ! OK MYARRAY 10 DUMP 0ACA: 0000 0000 0000 0000 2000 0000 0000 0000 0ADA: 0000 0000 OK
Blocks 966 and 967 actually contain the definitions of three overloaded versions of ARRAY for elements of data types SINGLE, DOUBLE and FLOAT or their respective direct or indirect subtypes:
ARRAY ( SINGLE UNSIGNED -- ) ARRAY ( DOUBLE UNSIGNED -- ) ARRAY ( FLOAT UNSIGNED -- ) CARRAY ( SINGLE UNSIGNED -- )
The additional defining word CARRAY allocates character-size elements:
CHAR A 10 CARRAY MYSTRING OK MYSTRING .S 10 TYPE CDATA -> CHARACTER AAAAAAAAAA OK CHAR B MYSTRING 6 + ! OK MYSTRING 10 TYPE AAAAAABAAA OK
StrongForth requires all words to have well-defined stack diagrams. It is not possible to define ANS Forth words like ?DUP, FIND, PICK and ROLL, whose stack effects depend on conditions that are generally not known at compile time. Some of these words, like FIND, can easily be modified to have unambiguous stack diagrams. Most other words with ambiguous stack diagrams are dispensible. The necessity for PICK and ROLL, for example, arises only in badly factored code and can simply be avoided by using locals.
Nevertheless, many Forth programmers will miss ?DUP, because this word helps in some situations to keep the code short. The typical situation in which ?DUP is used is immediately preceding a conditional branch. Conditional branches are compiled by IF, UNTIL and WHILE. But not even StrongForth's data type system requires that both branches start with identical compiler data type heaps, and this means that a combination of ?DUP and a conditional branch can be implemented without corrupting the data type system. This consideration leads to the definitions of ?IF, ?UNTIL and ?WHILE in blocks 968 and 969:
?IF ( -- ORIGIN ) \ is ?DUP IF ?UNTIL ( DESTINATION -- ) \ is ?DUP UNTIL ?WHILE ( DESTINATION -- ORIGIN 1ST ) \ is ?DUP WHILE
The stack diagrams of these three words are the same as the stack diagrams of IF, UNTIL and WHILE, respectively. The differences are that the new words compile ?BRANCH instead of 0BRANCH, and that they assume a stripped stack diagram at the destination of the branch. ?BRANCH has an additional output parameter with respect to 0BRANCH, which is a copy of the input parameter.
0BRANCH ( SINGLE -- ) ?BRANCH ( SINGLE -- 1ST )
The output parameter is only present if the branch is not taken. ?BRANCH is thus a concatenation of ?DUP and 0BRANCH. In order to keep the data type system consistent, ?IF and ?WHILE freeze the contents of the compiler data type heap minus the compound data type associated with this parameter. Similarly, UNTIL compares the contents of the compiler data type heap minus one compound data type with the one frozen by BEGIN. Here's a small example:
968 969 THRU OK : TEST ( CHARACTER -- ) ?IF . ." is not <NUL>" THEN ; OK CHAR X TEST X is not <NUL> OK NULL CHARACTER TEST OK
The x86 architecture cleanly separates I/O addresses from memory addresses. Dedicated machine code instructions need to be used to access I/O ports. That's why StrongForth provides the special data types PORT and CPORT for port addresses. However, you might have noticed that there are no words that actually deal with port addresses. The reason is that all commonly used desktop operating systems do not allow directly accessing I/O ports. Therefore, the overloaded versions of @ and ! for port addresses have not been included in StrongForth.
In order to support systems that allow direct port access, like old versions of DOS, blocks 970 and 971 contain the missing definitions. Note that these are all code definitions that require the assembler to be loaded before they can be compiled.
! ( SINGLE PORT -> 1ST -- ) ! ( DOUBLE PORT -> 1ST -- ) ! ( SINGLE CPORT -> 1ST -- ) @ ( PORT -> SINGLE -- 2ND ) @ ( PORT -> DOUBLE -- 2ND ) @ ( CPORT -> SINGLE -- 2ND ) @ ( CPORT -> SIGNED -- 2ND ) @ ( CPORT -> FLAG -- 2ND )
The ANS Forth words PICK and ROLL are not supported by StrongForth, because the value of the index and thus their stack diagrams cannot be determined at compile time. Consider the following case:
5 BIT -243 CHAR B TRUE 62 .S LOGICAL SIGNED CHARACTER FLAG UNSIGNED OK 3 PICK OK
What should be on the interpreter data type heap now? Well, that's pretty obvious:
.S LOGICAL SIGNED CHARACTER FLAG UNSIGNED SIGNED OK
In interpretation state, it should be possible for PICK (and ROLL) to determine the resulting stack effect, because the value of the index parameter is always known. Things are different in compilation state. In the first example, a hypothetical immediate word PICK might be able to find out that the value on top of the stack at runtime will be the integer 3, so this intelligent version of PICK could compile the right code and update the compiler data type heap accordingly:
: FOO ( LOGICAL SIGNED CHARACTER FLAG UNSIGNED -- ... ) 3 PICK \ ( LOGICAL SIGNED CHARACTER FLAG UNSIGNED SIGNED ) ... ;
Even if the index does not immediately precede PICK as a literal, a sufficiently clever PICK can still find out the value of the index at compile time. But certainly not in these cases:
: BAR1 ( LOGICAL SIGNED CHARACTER FLAG UNSIGNED UNSIGNED -- ... ) PICK \ ( ??? ) ... ;
: BAR2 ( LOGICAL SIGNED CHARACTER FLAG UNSIGNED -- ... ) PARSE-WORD NUMBER DROP CAST UNSIGNED PICK \ ( ??? ) ... ;
Since our intelligent PICK is an immediate word, we might decide to pass the index parameter as an interpreted value. This version will always work, because the value of the index parameter is calculated at compile time:
: FOOBAR ( LOGICAL SIGNED CHARACTER FLAG UNSIGNED -- ... ) [ 3 ] PICK \ ( LOGICAL SIGNED CHARACTER FLAG UNSIGNED SIGNED ) ... ;
It is actually possible to implement state-smart immediate words PICK and ROLL that work this way. They are defined in blocks 972 to 979. Both words expect the index as a parameter of data type UNSIGNED:
PICK ( UNSIGNED -- ) ROLL ( UNSIGNED -- )
Note that not all applications of PICK and ROLL as specified by ANS Forth are covered by this particular implementation. If the application does not allow calculating the index at compile time, they simply have no chance. But on the other hand, the StrongForth versions of PICK and ROLL are clever enough to deal with data items instead of just with cells. It is possible to pick or roll a single-cell item, a double-cell item and even a floating-point number out of an arbitrary list of different data types. Please see the following examles to get a feeling for it:
FALSE LATEST 5.915E-4 7 BIT .S FLAG DEFINITION FLOAT LOGICAL OK 3 ROLL .S DEFINITION FLOAT LOGICAL FLAG OK 2 PICK .S DEFINITION FLOAT LOGICAL FLAG FLOAT OK . . . . . 0.0005915 FALSE 128 0.0005915 ROLL ( UNSIGNED -- ) OK : TEST ( SIGNED-DOUBLE DATA -> UNSIGNED FLOAT FLAG -- ) [ 2 ] ROLL .S [ 1 ] PICK . . . . . ; SIGNED-DOUBLE FLOAT FLAG DATA -> UNSIGNED OK +7154390. BASE 1E13 TRUE TEST TRUE 1492 TRUE 10000000000000. 7154390 OK
The ANS Forth word RESIZE tries to change the size of a memory block that has been allocated with FAR-ALLOCATE or CFAR-ALLOCATE. This operation always succeeds if the new size is less than the old one. If the new size is greater than the old one, the operation might fail, because the requested amount of memory is not available.
StrongForth relies on the memory allocation functions of the operating system. It does not even try to allocate a new memory block at a different address if the operating system cannot extend the existing memory block, although this is allowed according to the ANS Forth specification. This means, an attempt to resize an existing memory block will either fail, or succeed at the same address. Blocks 980 to 982 provide an enhanced version of RESIZE that takes a second chance to resize a memory block at a different location, once the first attempt to resize a memory block at the original location fails. If the second attempt succeeds, the contents of the original memory block is copied to the new memory block, and then the original memory block is released.
In order to be able to copy the contents of the memory block, RESIZE needs versions of MOVE for addresses of data type FAR-ADDRESS, which are not yet included in StrongForth:
MOVE ( CFAR-ADDRESS -> SINGLE CFAR-ADDRESS -> 2ND UNSIGNED -- ) MOVE ( FAR-ADDRESS -> DOUBLE FAR-ADDRESS -> 2ND UNSIGNED -- ) MOVE ( FAR-ADDRESS -> SINGLE FAR-ADDRESS -> 2ND UNSIGNED -- ) RESIZE ( CFAR-ADDRESS UNSIGNED -- 1ST SIGNED ) RESIZE ( FAR-ADDRESS UNSIGNED -- 1ST SIGNED )
Here's an example of a case where the standard version of RESIZE fails, and the enhanced version succeeds:
900 939 THRU \ load memory-allocation word set OK 15000 FAR-ALLOCATE . CONSTANT MB1 0 OK 15000 FAR-ALLOCATE . CONSTANT MB2 0 OK MB1 20000 RESIZE . CONSTANT MB3 \ failure -308 OK 980 982 THRU \ load enhanced version of RESIZE OK MB1 SIZE . 15008 OK MB1 20000 RESIZE . CONSTANT MB3 \ success 0 OK HEX MB1 . MB3 . \ compare addresses of old and new memory block 4CBE0000 54140000 OK DECIMAL MB3 SIZE . 20000 OK
In many applications of conditional clauses, only one single word is to be compiled between IF and THEN. ?? provides a shortcut for those cases. Typical words ?? is applied to are EXIT, LEAVE, NEGATE and CR. For example, instead of writing
: ABS ( SIGNED -- 1ST ) DUP 0< IF NEGATE THEN ;
you can alternatively write
: ABS ( SIGNED -- 1ST ) DUP 0< ?? NEGATE ;
The compiled code is exactly the same.
Block 984 contains the definition of a word named .ROMAN, which displays an unsigned number in roman format:
984 LOAD OK 1492 .ROMAN MCDXCII OK 2007 .ROMAN MMVII OK
[DEFINED] and [UNDEFINED] are common extensions to ANS Forth. They return a flag indicating whether a word with a given name has been defined or not, respectively. The flag is typically consumed by [IF]. You can make these two words available by loading block 985:
985 LOAD OK [DEFINED] ACCEPT . TRUE OK [DEFINED] XLERB . FALSE OK
Blocks 986 and 987 contain an implementation of the Quicksort algorithm.
986 987 THRU OK WORDS QUICKSORT QUICKSORT ( DATA -> SINGLE 1ST (S1--F) -- ) OK
Quicksort sorts a sequence of single cell items that are stored in memory starting at address DATA -> SINGLE. 1ST is the address of the last cell to be included. (S1--F) is the qualified token of a word that compares a couple of single cell items, returning an item of data type flag:
EXECUTE ( SINGLE 1ST (S1--F) -- FLAG )
By chosing an appropiate comparison word, items of any data type can be sorted in any possible way. Here's a simple example that sorts 10 unsigned single numbers in either direction, using < and > as comparison words:
DATA-SPACE OK HERE CAST DATA -> UNSIGNED CONSTANT START OK 1 , 6 , 0 , 2 , 5 , 9 , 3 , 8 , 7 , 4 , OK HERE CAST DATA -> UNSIGNED CONSTANT END OK : PRINT END START DO I @ . LOOP ; OK PRINT 1 6 0 2 5 9 3 8 7 4 OK START END 1- ( INTEGER 1ST -- FLAG )' < >CODE CAST (S1--F) QUICKSORT OK PRINT 0 1 2 3 4 5 6 7 8 9 OK START END 1- ( INTEGER 1ST -- FLAG )' > >CODE CAST (S1--F) QUICKSORT OK PRINT 9 8 7 6 5 4 3 2 1 0 OK
SINGLE does not need to be a number. It might even be a pointer to a structure, provided the structures can be sorted in some meaningful way, and a word that compares two structures is available. The result would in this case be a sorted list of pointers with the first pointer pointing to the structure that is considered being the first one with respect to the ordering criterion.
This implementation of Quicksort demonstrates both recursion and the usage of qualified tokens. However, note that the type casts to (S1--F) in the above example are potentially unsave. You have to know by yourself whether < and > are suitable comparison words for this application. On the other hand, allowing a rather generic stack diagram for the comparison word (SINGLE instead of INTEGER) makes the algorithm applicable to items that are not numbers. Only one restriction remains: The items to be sorted are cell sized. If you want to sort items of double-cell or character size, or floating-point numbers, appropriate overloaded versions of QUICKSORT have to be implemented, like these:
QUICKSORT ( DATA -> DOUBLE 1ST (D1--F) -- ) QUICKSORT ( CDATA -> SINGLE 1ST (C1--F) -- ) QUICKSORT ( DATA -> FLOAT 1ST (F1--F) -- )
The Programming-Tools word set contains the word SEE, which displays the virtual machine code of a colon definition. Unfortunately, this word cannot be used to display the code of virtual member functions, even though virtual members, just like deferred words, might be colon definitions as well:
SEE DESTRUCTOR SEE DESTRUCTOR ? is not a colon definition
SEE-VIRTUAL, whose definition is contained in block 988, does the job. Since virtual members may differ between derived classes, SEE-VIRTUAL expects the class whose virtual member is to be displayed on the stack.
180 186 THRU \ Memory-Allocation word set OK 364 399 THRU \ OOP word set OK 988 LOAD OK DT OBJECT SEE-VIRTUAL DESTRUCTOR VIRTUAL DESTRUCTOR ( OBJECT -- 1ST ) :NONAME ( OBJECT -- 1ST ) ; IS DESTRUCTOR OK
The DESTRUCTOR virtual member of class OBJECT is just a dummy word with no semantics. This example was provided here because OBJECT is the only ready-made class and DESTRUCTOR its only virtual member.
Operator overloading allows defining different versions of a word for different data types. For example, the three versions of NEGATE cover single numbers, double numbers, and floating-point numbers:
NEGATE ( INTEGER -- 1ST ) NEGATE ( INTEGER-DOUBLE -- 1ST ) NEGATE ( FLOAT -- 1ST )
In most of these cases, like in NEGATE, the number of parameters is the same in all overloaded versions. But it is also possible to use overloading in order to define versions with default parameters. One of the rare examples in StrongForth is /STRING. The basic version of /STRING expects a character string and a count of data type UNSIGNED. The second version expects only a character string and assumes 1 as the default count. If the two overloaded versions weren't machine code definitions, one could chose to define them like this:
: /STRING ( CDATA -> CHARACTER UNSIGNED INTEGER -- 1ST 3RD ) ROT OVER + ROT ROT - ; OK : /STRING ( CDATA -> CHARACTER UNSIGNED -- 1ST 3RD ) 1 /STRING ; OK
It is important to note that the version with the default count uses the basic version and is thus defined later than the basic version. In this case, the given order of the two versions does not cause a problem, because the interpreter and the compiler can always cleanly distinguish the two versions. But what about thie following case?
: DRAW-CIRCLE ( SIGNED SIGNED UNSIGNED -- ) \ ... \ ." Drawing circle with radius " . ." at (" SWAP 0 .R [CHAR] , . 0 .R [CHAR] ) . ; OK : DRAW-CIRCLE ( SIGNED SIGNED -- ) 100 DRAW-CIRCLE ; OK : DRAW-CIRCLE ( UNSIGNED -- ) +0 +0 ROT DRAW-CIRCLE ; OK : DRAW-CIRCLE ( -- ) 100 DRAW-CIRCLE ; OK
Obviously, the intention is to define a word that draws a circle on the screen with a given centre point and a given radius. Additionally, three overloaded versions are provided that assign default values for the radius and for the centre. But that won't work. The interpreter and the compiler will always find the last version, because it has no parameters and works as a catch-all in the dictionary. The parameters stay on the stack:
+20 -30 200 DRAW-CIRCLE Drawing circle with radius 100 at (0,0) OK .S SIGNED SIGNED UNSIGNED OK WORDS DRAW-CIRCLE DRAW-CIRCLE ( -- ) DRAW-CIRCLE ( UNSIGNED -- ) DRAW-CIRCLE ( SIGNED SIGNED -- ) DRAW-CIRCLE ( SIGNED SIGNED UNSIGNED -- ) OK
The order of the words in the dictionary has to be changed. The correct order is either
DRAW-CIRCLE ( SIGNED SIGNED UNSIGNED -- ) DRAW-CIRCLE ( UNSIGNED -- ) DRAW-CIRCLE ( SIGNED SIGNED -- ) DRAW-CIRCLE ( -- )
or
DRAW-CIRCLE ( SIGNED SIGNED UNSIGNED -- ) DRAW-CIRCLE ( SIGNED SIGNED -- ) DRAW-CIRCLE ( UNSIGNED -- ) DRAW-CIRCLE ( -- )
On the other hand, how can a version be defined that is based on a version that is not yet defined? With DEFER? Yes, that's possible:
DEFER DRAW-CIRCLE ( -- ) DEFER DRAW-CIRCLE ( UNSIGNED -- ) DEFER DRAW-CIRCLE ( SIGNED SIGNED -- ) : DRAW-CIRCLE ( SIGNED SIGNED UNSIGNED -- ) \ ... \ ." Drawing circle with radius " . ." at (" SWAP 0 .R [CHAR] , . 0 .R [CHAR] ) . ; OK :NONAME ( SIGNED SIGNED -- ) 100 DRAW-CIRCLE ; IS DRAW-CIRCLE OK :NONAME ( UNSIGNED -- ) +0 +0 ROT DRAW-CIRCLE ; IS DRAW-CIRCLE OK :NONAME ( -- ) 100 DRAW-CIRCLE ; IS DRAW-CIRCLE OK
However, it looks pretty clumsy, and there's a runtime penalty because of the additional level of indirection. A more elegant solution is to actually change the order of a word after it has been defined. RETREAT, whose definition is available in block 989, moves the latest definition a given number of positions down in the dictionary, by simply exchanging the links of the definitions. Here's how it works in our example:
989 LOAD OK : DRAW-CIRCLE ( SIGNED SIGNED UNSIGNED -- ) \ ... \ ." Drawing circle with radius " . ." at (" SWAP 0 .R [CHAR] , . 0 .R [CHAR] ) . ; OK : DRAW-CIRCLE ( SIGNED SIGNED -- ) 100 DRAW-CIRCLE ; 1 RETREAT OK : DRAW-CIRCLE ( UNSIGNED -- ) +0 +0 ROT DRAW-CIRCLE ; 2 RETREAT OK : DRAW-CIRCLE ( -- ) 100 DRAW-CIRCLE ; 3 RETREAT OK +20 -30 200 DRAW-CIRCLE Drawing circle with radius 200 at (20,-30) OK .S OK WORDS DRAW-CIRCLE DRAW-CIRCLE ( SIGNED SIGNED UNSIGNED -- ) DRAW-CIRCLE ( SIGNED SIGNED -- ) DRAW-CIRCLE ( UNSIGNED -- ) DRAW-CIRCLE ( -- ) OK LATEST . DRAW-CIRCLE ( -- ) OK
Overoaded versions with default parameters are frequently used in object oriented programming, especially as constructors. That's a typical application for RETREAT.
Smart pointers are a rather strange feature of C++, which can easily be implemented in StrongForth. The word @@, whose definition can be found in block 990, evaluates @ repeatedly as long as an overloaded version exists that matches the compound data type on top of the data type heap:
: @@ ( -- ) BEGIN " @" TRANSIENT FALSE MATCH SEARCH-ALL NIP WHILE POSTPONE @ REPEAT ; IMMEDIATE
@@ might be considered useful when long chains of indirections are to be handled, like in this example:
CHAR & PAD ! OK PAD VARIABLE VARPAD OK VARPAD VARIABLE VARVARPAD OK VARVARPAD .S DATA -> DATA -> CDATA -> CHARACTER OK @ @ @ . & OK VARVARPAD @@ .S . CHARACTER & OK
In C++, smart pointers are suggested to be used in object oriented programming. Consider a class that defines a member word @ that returns an object of another class. The other class also has a member word @ and so on. @@ will then repeatedly execute or compile @, traversing the chain of indirections until an item is found for which no overloaded version of @ can be found. This item remains on the stack.
The technique of overloading words often results in sets of definitions that look very similar. A good example are the 12 versions of DUMP, whose definitions are stored in blocks 84 to 89. The first three of them, which cover single-cell items in the DATA, CONST and CODE memory areas, are in fact identical except for the first input parameter:
: DUMP ( DATA UNSIGNED -- ) OVER -> SINGLE SWAP + LOCALS| END | -> SINGLE BEGIN DUP END < WHILE DUP 8 + END MIN DUP ROT DUP .ADDR DO SPACE I @ .HEX LOOP CR REPEAT DROP ; : DUMP ( CONST UNSIGNED -- ) OVER -> SINGLE SWAP + LOCALS| END | -> SINGLE BEGIN DUP END < WHILE DUP 8 + END MIN DUP ROT DUP .ADDR DO SPACE I @ .HEX LOOP CR REPEAT DROP ; : DUMP ( CODE UNSIGNED -- ) OVER -> SINGLE SWAP + LOCALS| END | -> SINGLE BEGIN DUP END < WHILE DUP 8 + END MIN DUP ROT DUP .ADDR DO SPACE I @ .HEX LOOP CR REPEAT DROP ;
Of course, proper factoring can help to avoid coping most of the phrases in these definitions, but beware that their virtual code is not 100% identical. Based on the respective data type of the starting address, different versions of @ are compiled within the inner loop. This makes factoring difficult in this case. But there's actually a solution that helps avoiding almost identical definitions. Templates are source code phrases that serve as a common pattern. In the example of DUMP, the template looks like this:
: DUMP ( address UNSIGNED -- ) OVER -> SINGLE SWAP + LOCALS| END | -> SINGLE BEGIN DUP END < WHILE DUP 8 + END MIN DUP ROT DUP .ADDR DO SPACE I @ .HEX LOOP CR REPEAT DROP ;
address is a placeholder for either DATA, CONST or CODE. After blocks 991 to 993 have been loaded and the template is stored in a text file called DUMP.FTH, all three versions of DUMP can be compiled by instantiating the template:
" DUMP.FTH" INSTANTIATE" address DATA" INCLUDE " DUMP.FTH" INSTANTIATE" address CONST" INCLUDE " DUMP.FTH" INSTANTIATE" address CODE" INCLUDE
INSTANTIATE" expects a character string with the file name of the template on the stack. It then parses a word for the placeholder and a string for the replacement. The template is the copied into a temporary file, replacing all occurences of the placeholder word with the replacement string. Since INSTANTIATE" returns the temporary file as an item of data type FILE, it can immediately be processed with INCLUDE. An overloaded version of INSTANTIATE" that expects a file instead of a character string with the name of the file makes it possible to chain temporary files in cases where the template contains multiple different placeholders. Suppose we wanted to cover three more versions of DUMP with the same template:
: DUMP ( DATA -> DOUBLE UNSIGNED -- ) OVER SWAP + LOCALS| END | BEGIN DUP END < WHILE DUP 4 + END MIN DUP ROT DUP .ADDR DO SPACE I @ .HEX LOOP CR REPEAT DROP ; : DUMP ( CONST -> DOUBLE UNSIGNED -- ) OVER SWAP + LOCALS| END | BEGIN DUP END < WHILE DUP 4 + END MIN DUP ROT DUP .ADDR DO SPACE I @ .HEX LOOP CR REPEAT DROP ; : DUMP ( CODE -> DOUBLE UNSIGNED -- ) OVER SWAP + LOCALS| END | BEGIN DUP END < WHILE DUP 4 + END MIN DUP ROT DUP .ADDR DO SPACE I @ .HEX LOOP CR REPEAT DROP ;
There are several differences to the first three versions, and we have to extend the template accordingly:
: DUMP ( address UNSIGNED -- ) OVER type SWAP + LOCALS| END | type BEGIN DUP END < WHILE DUP count + END MIN DUP ROT DUP .ADDR DO SPACE I @ .HEX LOOP CR REPEAT DROP ;
We now have three different placeholders, and the instantiation becomes more complex:
" DUMP.FTH" INSTANTIATE" address DATA" INSTANTIATE" type -> SINGLE" INSTANTIATE" count 8" INCLUDE " DUMP.FTH" INSTANTIATE" address CONST" INSTANTIATE" type -> SINGLE" INSTANTIATE" count 8" INCLUDE " DUMP.FTH" INSTANTIATE" address CODE" INSTANTIATE" type -> SINGLE" INSTANTIATE" count 8" INCLUDE " DUMP.FTH" INSTANTIATE" address DATA" INSTANTIATE" type " INSTANTIATE" count 4" INCLUDE " DUMP.FTH" INSTANTIATE" address CONST" INSTANTIATE" type " INSTANTIATE" count 4" INCLUDE " DUMP.FTH" INSTANTIATE" address CODE" INSTANTIATE" type " INSTANTIATE" count 4" INCLUDE
Note that a placeholder can be replaced by a string that contains spaces, like in INSTANTIATE" type -> SINGLE", and that the replacement string might be empty, like in INSTANTIATE" type "". The space character after the placeholder may not be omitted. Here are the stack diagrams of the two overloaded versions of INSTANTIATE":
INSTANTIATE" ( CDATA -> CHARACTER UNSIGNED -- FILE ) INSTANTIATE" ( FILE -- FILE )
For the implementation of INSTANTIATE", words that create temporary files are needed: Three words based on the low-level word (CREATE-TEMPORARY) are defined in block 991. Their definitions and their usage are similar to those of the three overloaded versions of CREATE from the File-Access word set. The character string is the name of the directory in which the temporary file is to be created, because the filename itself is chosen by DOS:
CREATE-TEMPORARY ( CDATA -> CHARACTER FAM -- FILE SIGNED ) CREATE-TEMPORARY ( CDATA -> CHARACTER UNSIGNED FAM -- FILE SIGNED ) CREATE-TEMPORARY ( CCONST -> CHARACTER UNSIGNED FAM -- FILE SIGNED )
Block 993 contains an overloaded version of the word TYPE, which displays the contents of the text file that is provided as an item of data type FILE:
: TYPE ( FILE -- ) BEGIN DUP PAD 80 ROT READ-LINE THROW WHILE PAD SWAP TYPE CR REPEAT DROP DROP ;
With TYPE, you can easily test template instantiations:
" DUMP.FTH" OK INSTANTIATE" address DATA" OK INSTANTIATE" type -> SINGLE" OK INSTANTIATE" count 8" OK TYPE : DUMP ( DATA UNSIGNED -- ) OVER -> SINGLE SWAP + LOCALS| END | -> SINGLE BEGIN DUP END < WHILE DUP 8 + END MIN DUP ROT DUP .ADDR DO SPACE I @ .HEX LOOP CR REPEAT DROP ; OK " DUMP.FTH" OK INSTANTIATE" address CODE" OK INSTANTIATE" type " OK INSTANTIATE" count 4" OK TYPE : DUMP ( CODE UNSIGNED -- ) OVER SWAP + LOCALS| END | BEGIN DUP END < WHILE DUP 4 + END MIN DUP ROT DUP .ADDR DO SPACE I @ .HEX LOOP CR REPEAT DROP ; OK
Templates can also be quite useful in object oriented programming. In C++, template class definitions are used quite frequently.
Dr. Stephan Becher - March 30th, 2008