Forth system implementation

xiaoxiao2021-03-06 35

Original author brrad Rodriguez

Compiler Zhao Yu Zhang Wencui

This article compiled from BRAD RODRIGUEZ "MOVING FORTH", the original text is published on The Computer Journal # 59 (January / February 1993), now you can get on the website below

http://www.zetiffs.com/bj/papers/index.html

This paper discusses in-depth discussion on the implementation of the Forth language in various processors, although the processor as an example is very old, but it is still a great reference value for understanding the FORTH system. According to the original structure translation, the references of each part are omitted, and the content and source code code listed herein can be obtained on the above websites.

table of Contents

First part of the Forth core design decision

Second part core baseline test and case study

Part III Decryption DOES>

Part IV Compiler or Meta Compiler

The fifth part Z80 primitive

Section 6 Z80 Advanced Nuclear

Seventh part 8051 Camelforth

Eighth part MC6809 Camelforth

First part of the Forth core design decision

Foreword

Everyone entering the Forth circle said or heard that "putting Forth to a new CPU is a fun thing." However, just like many other "easy to be confused", there is not much written information tell us how to do it! Therefore, when Bill Kibleer suggested this papers, I decided to break the FORTH writer only said that the tradition of not practicing, giving a white paper black word FORTH implementation, including the Forth system implemented for MC6809, Intel 8051 and Z80.

In the entire document, I am prepared to implement the Forth system for MC6809, Intel 8051, and Zilog Z80. I will use MC6809 to explain a simple and traditional Forth model. In addition, I will also publish a MC6809 assembler, use 6809 Forth for future TCJ plans, and put 8051 as a university project, which also explains some very different Decision. Z80 Forth is written for all TCJ CP / M readers and many old friends of TRS-80.

Effective hardware

First, we must choose a CPU. However, I don't want to fall into the debate of "Forth run in this CPU more effective than running on that kind of CPU", because the choice of CPU usually needs to consider other factors, and one of the goals of this paper is to explain how to explain how Move Forth to any CPU.

Typically, the 16-bit Forth core requires a 8K byte of program space. For a complete kernel that can truly compile the Forth language application, there should be at least 1k byte of RAM. If you want to use Forth's disk memory block management function, you should also add RAM above 3K bytes to buffer. For 32-bit systems, these values need to be doubled.

These are the minimum requirements that the FORTH kernel can run. In order to run the application on the hardware, you have to add additional PROM and RAM size according to actual needs.

Use 16 bits or 32-bit systems

The actual system does not require the word length of FORTH to match the word length of the CPU. Minimal, actually available Forth systems use a 16-bit model, which means using 16-bit integers and 16-bit addresses. FORTH term refers to this size as a cell rather than what we often say, because "Word" refers to a Forth definition in Forth (you can easily understand the subscription name of other advanced languages).

All 8-bit CPUs are almost unable to support 16 bits of Forth, for this, require encoding for double-byte arithmetic operations, although some 8-bit CPUs can directly support some of them. There are some technologies to write 32-bit Forth on a 16-bit machine, but 16-bit CPUs usually run 16-bit Forth, although we also see that 32-bit Forth can run on Intel 8086/8088.

32-bit CPUs typically run 32-bit Forth. In practical applications, a smaller model can hardly save code space and processor time. But I also saw 16 bits of Forth written for MC68000. The code length of this system is reduced by 2 times because the advanced FORTH definition becomes 16-bit address string and no longer uses an address string of 32-bit. However, most of the MC68000 systems have a lot of RAM, as if there is no need to make such efforts.

All examples described herein are 16-bit Forth systems running on the 8-bit CPU.

Serial coding technology

"Serial line encoding" is the flag of Forth. A FortH "Serial" is a list of subroutine addresses that are executed. You can imagine a series of subroutine calls for the Call instruction. For a long time, people have invented a variety of string forms, in order to make choices, you must understand how all these string forms work, and their respective advantages and disadvantages.

Indirect Series Coding (ITC)

Indirect Series Coding (ITC) technology is a classic FortH-line coding technology that is earlier in the FORTH and F83 systems and is described in many books about Forth. Later series is the "development" of direct string coding mode, so you need to understand this technology first.

Let us look at a definition of the Forth word Square:

: Square dup *;

In a typical ITC Forth, the case in the memory is shown in Figure 1 (the head will be discussed later, it saves compilation information, but is not accessed in the string)

Figure 1 Memory defined by ITC Forth

Suppose it encounters the word Square, Forth's interpretation pointer IP will point to a unit of the memory, including the address of the word Square, which is more stringent to say that this unit contains Square's code domain address. ". The interpreter reads this address and reads the code domain content of Square with this address. It is still an address - this address is a machine language subroutine, which is executed by this subroutine to define Square.

We can express the above description by pseudo code as follows:

(Ip) -> W Read the memory content to the W register to the W register, and now there is address of the code field;

IP 2-> IP incremental IP, it is like a program counter, and assumes that the address in the strip is 2 bytes long;

(W) -> x Read the memory content to which W is to the X register, X now point to the machine code address;

JP (x) jumps to the address pointed to by the X register;

Here is an important thing here, but very few people explain: The current address of the entry Forth word is saved in the W register. The Code word does not require this information, but other types of Forth words do need it.

If Square is written by the machine code, things are over: These machine code is executed, then jump back to the Forth Interpreter - Since IP has increments, it will point to the next word will be executed. So the Forth interpreter is often referred to as Next.

However, Square is a advanced "colon" definition - it keeps a "string" or an address list. To perform this definition, the Forth interpreter must restart on a new location, this location is the parameter domain of Square. Of course, the interpreter must save the old position so that the "another" Forth word can be restored after the end of SQUARE. This actually doesn't have any difference with a subroutine call! The action of the Square machine language is simply saved the old IP value to the stack and points the IP to the new location, execute the interpreter, and pops up the IP after Square is completed. (As you can see, IP is the "Program Counter" of the Forth Advanced Definition), which may be called Docolon or Enter in different Forth versions:

Push IP is pressed into the "return address stack";

W 2 -> IP W has point to the code domain, so W 2 is the address of the definition! (Suppose is the 2-byte address, different forth may be different)

JUMP next to the interpreter ("Next")

Such a code is used for all advanced (string) Forth definitions! So we answered two questions:

• Why use a pointer to the code segment in the Forth definition instead of embedding the code segment itself directly into the definition. Because if there are hundreds of definitions, you can save a lot of space;

• Why this way is called "indirect string coding";

"From subroutine return" action by word EXIT, it is ";" compiled into the definition (Some Forth system use; s). EXIT performs the following machine language:

POP IP pops up from "Return Address Stack"

Jump Interpreter jumps to the interpreter

Note that ITC features: Each Forth word has a unit's code domain, and the colon defines each word in the definition to compile a unit. The Forth Interpreter In order to execute machine code, you must actually perform the two intercourse to get the address of the next machine code (first pass the IP, then pass W).

ITC is neither code size and is not the fastest speed of speed. It may be just the simplest technique, although the other technology DTC discussed below is actually not particularly complicated. So why do you have so many Forth systems use indirect strip technology? Mainly due to the use of indirect strip technology because of the Forth system used as the original model, but now, DTC technology is used up to.

So when should I use ITC technology? Obviously, the ITC form can produce the pure and most consistent definition: only one type, this type is address. If you have such a need, ITC technology is suitable. If your code focuses on the inside of the defined, the simplicity of ITC technology and singleness can also increase portability.

In addition, ITC is a classic Forth model that can be used very well for teaching.

Finally, in some early CPUs that lack subroutine call instructions - such as 1802 - ITC is often more efficient than DTC.

Direct Serular Code (DTC)

The difference between direct serial coding (DTC) technology and ITC technology is only one point: unlike ITC's address of the machine code in the code domain, the DTC code domain contains actual machine code itself.

Note that I am not said that all ENTER code is included in each colon definition. I mean: In the "Advanced" Forth word, as shown in Figure 2, the code field has a subroutine call command. For example, a call will contain a call to the Enter subroutine. Figure 2 DTC Forth defined storage

The direct string NEXT pseudo code is very simple:

(Ip) -> w Take the memory content pointing to the IP pointer to the W register

IP 2 -> IP increment IP (assuming 2 bytes of address)

JP (W) jumps to the address of the W register points to

The DTC's income is speed: Interpretation The program now only needs to perform indirect. On Z80, this is actually a reduced code segment in the NEXT subroutine -forTh core - from 11 instructions to 7 instructions.

The cost of DTC is space: In a z80 Forth, each advanced definition will increase the length of one byte because 2 bytes of the two bytes in ITC are now replaced by 3 bytes of Call call instructions. Of course, this conclusion is not widely applicable. In 32-bit MC68000 Forth, the 4-byte BSR instead of 4 bytes of addresses, there is no difference. In Zilog's Super8, there is a directive directly for Forth DTC, which replaces the 2-byte address with a byte of the Enter instruction, so that the DTC Forth is smaller than the code of the ITC Forth.

Of course, the DTC's Code definition is also shortened by bytes because they no longer need pointers.

I have always thought that the DTC Forth's advanced definition word must use subroutine call instructions in the code domain, and Frank Sergeant's Pygmy Forth [Ser90] proposes to use simpler jump instructions, which is easier to implement, usually faster.

Guy Kelly has made a good summary of the Forth system implemented on IBM PC, which is also my recommendations for all Forth writers.

In the 19 Forth implementations he studied, there were 10 DTC technology, 7 ITC technology, 2 use subroutine string technology (this technology we will discuss below). So, I think all newly implemented Forth kernels should use direct strip technology without using indirect strip technology.

Do you skip to NEXT or use embedded encoding for NEXT?

Forth's inner interpreter Next is a universal subroutine for all Code definitions. You can write a subroutine and let all the Code strokes beyed on this subroutine (Note: Jump to Next without calling to Next).

However, the speed of NEXT is critical to the entire Forth system, from this perspective, NEXT is preferably embedded, so NEXT can also be defined as a compilation macro.

This is a common speed / space compromise: embedded next is always faster, but it is always bigger. All increased size is the number of bytes required in embedded extensions multiplied by the number of CODE words in the system. Of course, sometimes it doesn't matter to consider compromise: in MC6809, embedded next is always shorter than a JUMP directive!

Subrigra Series (STC)

A advanced Forth definition word is just "a list of subroutines to be executed", not necessarily to achieve them by explaining, you can also get the same effect by simply calling a series of subroutines:

Square:

Call Dup

Call *; or a suitable name, because some assembler does not support * as a subroutine name

RET

Figure 3 shows the assembler explaining the STC string technology of Forth. [Kog82].

Figure 3 DTC Forth defined storage

The STC has a unified representation, the colon definition and the code word is not different, "defined words" (this is the special term for Forth, like variable, constant so can be used to define new words words called definition words) like DTC The same processed - the code domain turns to the machine code in other places with a jump or call instruction.

One of the main disadvantages of STC is that subroutine call instructions are usually larger than simple address list access. For example, on Z80, the size defined size will increase by 50% - most of your application is a colon definition. Compared to 32-bit MC68000, if you use 4 bytes of BSR instead of 4 byte addresses, the code size does not increase, but if your code exceeds 64K, some addresses must be replaced with 6-bytes of JSR.

Subprogramcles may be faster than the direct string. Save time to execute the interpreter in the STC, but the Forth word must be spent for returning PUSH, POP time. In DTC Forth, only advanced definitions cause return stack operations, in MC6809 or Zilog Super8, DTC is faster than STC.

There is still an advantage: it does not require IP registers. Some processors - like Intel8051 - missing address registers, without virtual machine IP registers can truly simplify the kernel and increase speed.

STC's embedding, optimization, direct compilation

On some ancient 8-bit CPUs, almost every Forth primitive needs to be implemented with several machine instructions, but on the more powerful CPU, sometimes Forth is only a machine instruction. For example, on a 32-bit MC68000, DROP can be simplified:

AddQ # 4, An here An is Forth's PSP parameter stack register

In the forth of a subroutine string, use DROP in the colon definition will produce such sequences:

BSR ...

BSR Drop ...

Drop:

AddQ # 4, An

BSR ...

RTS

AddQ is originally a 2-byte directive, why should we write a 4-byte subroutine call to this 2-byte instruction? In this case, no matter how many DROPs, no savings are produced by subroutine calls. And if addQ is directly encoded into the BSR stream, the resulting code will be smaller and run faster. Some Forth compilers have implemented such a Code word "embedded extension" [CUR93A].

The shortcoming of embedded extensions is that if you want to compile the code back to the original code, it will be very difficult. If it is just a subroutine string, we can still get a pointer to the Forth word (the address of the subroutine). By the word pointer, you can get their name. However, if the instruction word is extended to the embedded encoding, all information about the source is all lost.

In addition to speed and space, there is also an advantage in embedding: potential code optimization. For example: Forth sequence:

The 68000 STC is compiled:

BSR LIT

.Dw 3

BSR Plus

However, using embedded code, you can optimize it into a machine instruction.

Forth compiler optimization is a vast field, and it is also a very active area in Forth language research. Here is not fully discussed, see [SCO89] and [CUR93B]. The final result of optimizing the STC is to generate the Forth compiler of "pure" machine code, just like the C or Fortran compiler. Marker Series Coding (TTC)

The goal of DTC and STC technology is to increase the implementation speed of the Forth program with a certain memory consumption. Now let us turn to the other direction of ITC: the running speed is slower, but the code size is smaller.

The purpose of the FORTH string is to specify the address of the Forth word (subroutine) to be executed. Suppose a 16-bit Forth word is only 256 Forth words, then each Forth word can be identified with an 8-digit number, we can use a 16-bit address list, but use a series of 8-bit identifiers or "TOKEN" is called "TOKEN" instead of the address, so the code size defined by the colon is reduced by half.

In a forth system encoded in a marker, you need to have a table that records all Forth words, as shown in Figure 4. The tag value is the index of this item, and it is looking for the Forth word corresponding to a specified tag. This method adds an indirect access to the Forth interpreter, so it is slower than the "address string" forth performs slower.

Figure 4 DTC Forth defined storage

The basic advantage of the marker string is that the size is small. TTC technology is extremely common in handheld computers and other signs of size requirements. At the same time, using a unified Forth word "entry" table also simplifies links to separate compilation modules.

The disadvantage of TTC is: slow speed. TTC's Forth System speed is the slowest in all technologies, and TTC compilers are more complex than other technology compilers. If your application has more than 255 Forth definitions, some other coding methods are required to mix 8-bit and larger tags.

Speaking of the Token series, maybe the situation is the 32-bit Forth system uses 16-bit Forth code through the token string, but how many 32-bit systems actually have a memory size?

Segment series coding

Since there are many Intel 8086 derived systems, we also briefly mention the series line technology. This technique no longer uses a "general" byte address within a 64K segment, but the use section address (in Intel 8086, the size of one section is 16 bytes). In this way, the interpreter can put these addresses to the segment register rather than the usual address register. This allows 16-bit Forth models to effectively access the 8086 1M byte memory.

The basic disadvantage of the segment series is the 16-byte size memory "granularity", as this technology requires that each Forth word must be aligned on the 16-byte boundary, and a Forth word has a random length, so average each The word wasted 8 bytes.

After discussing various string technologies, the allocation and use of the CPU register is a crucial design consideration. This may be the most difficult. The availability of the CPU register will reverse the decision which series line technology we use, and even decide which way we use. Memory mapping.

Classic Forth Register

The classic Forth virtual machine model has 5 "virtual registers". They are abstract entities of Forth primary language. Next, Enter, EXIT is defined with these abstract registers.

The width of each register is a unit, that is, in the 16-bit FortH system, they are 16-bit registers. (You will see later, there are also some special cases). They are not necessarily all CPU registers if your CPU does not have enough registers, some of which can be saved in the memory. This article will be described in accordance with the importance of these registers, that is, without sufficient CPU physical registers, the last described register should first consider being placed into the memory. W is a working register that it can be used to do a lot. First, the W register should be an address register and should be read and written as an address using the W register; also need to use the W register to do arithmetic operations. In DTC Forth, you also need to use W to implement indirect jump. The W register is used in each Forth word. If the CPU has only one register, you must also use this unique register for the W register, and put other registers in the memory, of course, this implementation will make The entire system is slow and incredible.

The IP is explained that it is used by each Forth word (through Next, Enter, Exit). IP must be an address register, you also need to increase IP. The Forth System of the subroutine string does not require this register.

PSP is a parameter stack pointer (or a data stack pointer) sometimes refers to SP. I am using PSP because "SP" is usually the name of the CPU hardware register, and they cannot be confused from each other. Most Code words need to use this register. The PSP must be a stack pointer or an address register capable of increment and reduction. If indexed addressing can be used by PSP, there is some additional benefits to the system.

RSP is Return to the stack pointer sometimes referred to as RP. In the ITC and DTC's Forth system, the RSP is defined by the colon, in the Forth system of the STC, is used by all the words. The RSP must be a stack pointer or an address register capable of increment and reduction.

If possible, both W, IP, PSP, RSP should be placed in the actual CPU physical register, and other virtual registers can be saved in the memory, of course, if all registers are kept in the CPU hardware register, they will bring speeds. Benefits.

The X register is a working register but does not consider it as a classic Forth register, even in the classic ITC implementation that uses it as a secondary indirect, is not treated as a classic register. In ITC, indirect jump must be implemented using the X register. The X register is also taken as a destination address of the arithmetic operation operation, which is particularly important in the processor that cannot use the memory as an operand. For example, on Z80, you need to implement additional operations by the following manner (represented by a pseudo code):

POP W

POP X

X w -> w

Push W

Sometimes another register y is also defined.

The UP is the user pointer that it maintains the user-area base address of the current task. Up usually usage is to use it in advanced Forth definitions after adding an offset. If the CPU can be addressed by the UP register, the Code word can access the user variable easier and more quickly. If you have extra registers, you can use one as a UP. The Forth of the single task does not require UP.

If x is required, X should take precedence over the UP into the CPU physical register. The UP is the most suitable of the storage in the Forth virtual register.

Hardware stacking

Many CPUs use the stack pointer as part of the hardware to interrupt and subroutine calls. What will I do if the stack pointer is a virtual register for Forth? Should it be a PSP or RSP? This should be considered according to the specific situation. Generally speaking in the Forth of ITC and DTC, the use of PSP is more frequent than RSP. If your CPU only doesn't have much register, PUSH and POP will be faster than explicitly reference memory, so we can use hardware stacks as Parameter stack.

On the other hand, if your CPU has a rich addressing method, especially allows index addressing, you should assign a universal address register for PSP. In this case, the hardware stack should be used as a return stack.

The conclusions here are not suitable for the following conditions. For example, in TMS320C25, the hardware stack has only 8 units, which is basically not used for the Forth system, so its hardware stack can only be used for interrupts, both PSP and RSP must be a universal address register. Note that the smallest parameter stack specified in the ANS Forth specification is 32 units, and the return stack is 24 units, and the data stack and return stack I have chosen are 64 units.

Sometimes you may encounter a dogma, such as the hardware stack "must be a parameter stack" or "must be a stack". In this case, you can write a few Forth original words such as: swap, over, @,!, , 0 = to see which case code is smaller, faster.

By the way, if you want to do this test, the value of words DUP and DROP are not high.

You will also get interesting conclusions. Gary Bergstrom points to the MC6809 DTC implementation, with the MC6809 user stack pointer as IP can be a few cycles, where NEXT has become POP. He uses the index pointer as the stack pointer of the Forth.

Put the stack top element (TOS) into the register

If you can put the parameter stack top element TOS in the register, the performance of Forth is significantly improved. Many Forth words (such as 0 =) will no longer access the stack, the other Forth words do the same PUSH and POP, but only in the code in the code. There is only a lot of Forth words (such as DROP and 2DROP) to become more complicated - you must update the contents of TOS at the same time.

After placing the stack top element into the register, you need to follow such rules when writing the Code word:

• When a word removes a project from the stack, you must pop up "new" TOS to register;

• A word joins a new item to the stack, you must press the "old" TOS into the stack (of course, unless it is consumed)

If your CPU has at least 6 physical registers, I suggest you save TOS to one of the registers. I think TOS is more important than UP, but it is important again from W, IP, PSP, and RSP registers. The TOS register performs a number of X registers, if this register can implement memory addressing is more useful. PDP-11, Z8, MC68000 processors are well.

Guy Kelly [kel92] studied the Forth system on 19 IBM PCs, which use TOS registers.

I think the idea of TOS is not widely accepted. First, there are some mistakes below:

• Added an instruction;

• Stack top elements must be accessed through memory.

• Excessively emphasize Pick, Roll words with high value, saying that they must be re-encoded in the case of TOS.

What if you put two stack top elements in the register? When you do this, the operational efficiency is the same. A Push is still a push, no matter what you have made before and later. On the other hand, buffering two stack elements have added a lot of code: a Push now turns a PUSH follows one Move. Cushing two elements into the registers, only make sense on the Forth chip in the RTX2000 class, and others are some imaginary, sounds seem to be very smart, but there is no significance in practical applications. Optimization. Some examples of actual allocation

Here is some different CPU registers allocated instances. Through this table, we can see that the register allocation of each Forth system author.

[1] F83. [2] Pygmy Forth.

Figure 5 Register Assignment

"SP" refers to the hardware stack pointer. "ZPAGE" means that the value saved in the 6502 memory zero page, the page is almost as useful as registers, sometimes more useful than registers. For example, they can be used for memory addressing. "Fixed" refers to the Payne's 8051 Forth has a single, unmovable user area, and UP is hard-coded constant.

We noticed what weird in the table above? 6502 Forth is a 16-bit model, but it uses 8-bit stack pointer.

In practice, it is possible to make the size of the PSP, RSP, and UP less than Forth. This is because the stack and user zone are relatively small relative to the entire CPU addressable memory. Each stack can be small to 64 units, and the user area has few more than 128 units. You only need to simply believe in:

• These data areas are limited to a small area of the memory, and short address access can be used;

• High address bits are provided in other ways, for example, by page selection;

In the 6502 CPU, the hardware stack is defined in a page of the CPU (0x1xx). The 8-bit stack pointer can be used as a return stack. The parameter stack is saved in the RAM page, indirect access via an 8-bit index register.

In 8051, you can use 8-bit registers R0 and R2 to access the external RAM and explicitly provide an address of the address to PORT 2. This allows "page selection" for two stacks.

The RSP of UP and PSP is clearly distinctive: it simply provides a base address, never increment and reduction. So, it actually only provides the high position of this virtual register. Low positions must be implemented with some index techniques. For example, in the MC6809, you can use the DP register as the UP high 8 bits, then use direct page addressing mode to access 256 locations in this page. This enforces the user area from 0x ?? 00, while limiting the length of the user area of 128 units, these are not big. In Intel 8086, you can also use a segment register as the base address of the user area.

references

[CUR93A] Curley, Charles, "Life in the Fastforth Lane," AWAITING PUBLICATION IN FORTH DIMENSIONS. Description of a 68000 Subroutine-Threaded Forth.

[CUR93b] Curley, Charles, "Optimizing in a BSR / JSR Threaded Forth," awaiting publication in Forth Dimensions. Single-pass code optimization for FastForth, in only five screens of code! Includes listing. [KEL92] Kelly, Guy M. , "Forth Systems Comparisons," Forth Dimensions XIII:.. 6 (Mar / Apr 1992) Also published in the 1991 FORML Conference Proceedings Both available from the Forth Interest Group, PO Box 2154, Oakland, CA 94621. Illustrates design tradeoffs of many 8086 Forths with code fragments and benchmarks - highly recomment!

[Kog82] Kogge, Peter M., "An Architectural Trail to Threaded- Code Systems," IEEE Computer, Vol. 15 NO. 3 (Mar 1982). Remains The Definitive Description of Various Threading Techniques.

[Rod91] Rodriguez, B.j., "B.Y. Assembler," Part 1, The Computer Journal # 52 (SEP / OCT 1991). General Principles of Writing Forth Assemblers.

[Rod92] Rodriguez, B.j., "B.Y. Assembler," Part 2, The Computer Journal # 54 (Jan / Feb 1992). A 6809 Assembler in Forth.

[SCO89] Scott, Andrew, "An Extensible Optimizer for Compiling Forth," 1989 FORML Conference Proceedings, Forth Interest Group, P.O. Box 2154, Oakland, CA 94621. Good description of a 68000 optimizer; no code provided.

FORTH implementation

[Cur86] Curley, Charles, Real-Forth for the 68000, Privately Distributed (1986).

[JAM80] James, John S., Figage Forth for the PDP-11, Forth Interest Group (1980).

[Kun81] Kuntze, Robert E., MVP-Forth for the Apple II, Mountain View Press (1981).

[LAX84] Laxen, H. And Perry, M., F83 for the IBM PC, Version 2.1.0 (1984). Distributed by The Authors, Available from The Forth Interest Group or Genie.

[LOE81] Loeliger, RG, Threaded Interpretive Languages, BYTE Publications (1981), ISBN 0-07-038360-X. May be the only book ever written on the subject of creating a Forth-like kernel (the example used is the Z80 ). [Mpe92] Microprocessor Engineering Ltd., MPE Z8 / Super8 PowerForth Target, MPE Ltd., 133 Hill Lane, Shirley, Southampton, S01 5AF, UK (June 1992). A Commercial Product .

[Pay90] Payne, William H., Embedded Controller Forth for the 8051 Family, Academic Press (1990), ISBN 0-12-547570-5. This Is A Complete "Kit" for A 8051 Forth, Including a metacompiler for the IBM PC. Hardcopy Only; Files Can Be Downloaded from Genie. Not for the Novice!

[SER90] Sergeant, Frank, Pygmy Forth for the IBM PC, version 1.3 (1990). Distributed by the author, available from the Forth Interest Group. Version 1.4 is now available on GEnie, and worth the extra effort to obtain.

[TAL80] TALBOT, R. J., Figage Forth for the 6809, Forth Interest Group (1980).

Second part core baseline test and case study

Benchmarks

We have answered questions related to Forth to implement decisions, and should now be "encoded and view results". However, you certainly don't want to prepare many complete Forth cores just to test different methods. Fortunately, you can get some quite a good "feelings" on the small subset of the Forth core.

Guy Kelly [kel92] studied some of the following code of 19 different IBM PCs:

• Next ... is a "inner interpreter" that links a word "string" in another word. The end of each Code definition is a most important factor that determines the speed of Forth. You have seen its ITC and DTC pseudo code; in the STC, it is the call / RET instruction.

• ENTER ... is also known as Docol or Docolon, advanced "colon" defines code domain actions. It is also critical for speed; it is not required in the start of each colon definition in STC.

• Exit ... is called S; End a colon definition code. It appears at the end of each colon definition, determines the return efficiency of the advanced subroutine. It is a RET machine instruction in the STC.

NEXT, ENTER, and EXIT show the performance of the strip mechanism. They should all be evaluated by actual coding. They also reflect whether IP, W and RSP register allocation policies are correct when implementation.

• Dovar ... "Variable", for all FORTH variables Variable's code domain action.

• DOCON ... "constant", for all Forth constant constant's code domain action. DOCON, DOVAR and ENTER show the efficiency of the parameter domain address that you can get the word being executed. This reflects your choice of W registers, in DTC Forth, points that you should put a JUMP instruction in the code domain or a CALL instruction.

• LIT ... "Text Quall". This word takes a unit value from the advanced string of Forth. There are several words that require such embedded parameters, which shows their performance. It reflects your choice of IP registers.

• @ ... Forth's memory read operation shows how fast it is available from the accessed memory from advanced Forth. This word often benefits from the stack TOS.

•! ... Forth's memory operates, reflects the capacity of memory access from the other hand. It consumes two items of the stack, so it can reflect the access efficiency of the parameter stack. It also illustrates the decision we put TOS in memory or placed in the register.

• ... add-on operation, is a typical representative of all Forth arithmetic and logical operations.

The above is a very good code sample. I also added several attached tests:

• DodeOS ... is a code domain action for building a word in does>, although it does not reflect W, IP, and RSP usage. I include this word because it is the most puzzled code in the Forth core, if you can encode DODOES logic, anything else is not there. The complexity of DODOES will be described later in this article.

• Swap ... is a simple stack operator, but can explain the problem.

• Rot ... is a more complex stack operator. It gives you a simple idea to give you a good idea. The ROT seems to need an additional temporary register to complete. If you can implement ROT without using the X register, it will not need the X register without using it.

• 0 = ... is not one of several single-grade arithmetic operations, which is most likely to benefit from "TOS in registers".

• ! ... is the most illustrated operation, combining stack access, arithmetic, memory taking and memory. This is a very ideal word for standard test, although the frequency is low than the other words listed above.

The above listed is the most commonly used Forth word, and strive to optimize they are worth it. I will give an example of a false code of MC6809. For other processors, I will explain the specially selected code snippet.

Case study 1: MC6809

In the 8-bit CPU world, the MC6809 is the sweet dream of the Forth programmer. It supports 2 stacks! There are also another 2 address registers and a large number of orthogonal addresses with only PDP-11. Orthogonal means that all address registers have the same option and the same mode of operation, and two 8-bit accumulators can be used as a single 16-bit accumulator and have a number of 16-bit operational instructions.

The programmer model of MC6809 is

A - 8 Bit accumulator

B - 8 Bit accumulator Most arithmetic operations are accumulated as a destination register. They can also be joined as a 16-bit accumulator D (A is 8 bits, B is 8 bits).

X - 16-bit index register

Y - 16-northern register

S - 16 stack pointer

U - 16-bit stack pointer All address modes for the X and Y registers can also be used for S and U registers.

PC - 16-bit program counter

CC - 8-bit condition logo register

DP - 8 direct page access registers

The direct addressing mode of the MC6800 series can use an 8-bit register to access any location of the zero page memory. The MC6809 allows direct addressing to any page. The DP register provides a high 8-bit address (page address).

There are 2 stack poins for Forth, they are equivalent, but the CPU designer uses S for subroutine calls and interrupts. For the consequence, we use S as a return stack, U as a parameter stack. Both W and IP require the address register, which logically use the X and Y registers, we can specify any:

X => w and y => IP.

Now select a string model. I simply discard STC and TTC to construct a "traditional" Forth. Performance restriction factors are NEXT subroutines. Let's take a look at its ITC and DTC implementation:

ITC-Next:

LDX, Y (8) (IP) -> W, incremental IP

JMP [, x] (6) -> TEMP, jump to the address of the temporary unit

DTC-Next:

JMP [, Y ] (9) (ip) -> TEMP, incremental IP, jump to the temporary unit address, the interior of the MC6809.

Next is only one instruction in the MC6809 of the DTC! This means you can use 2 bytes of embedded encoding, better than JMP next. As a comparison, the subroutine string is like this:

RTS (5) ... The end of the Code word

JSR nextword (8) ... start in the string of the next Code word

STC spends 13 cycles for one word under the string, and DTC only requires only 9 cycles. This is because the subroutine string needs to pop up the address pop-up and stack, but the Code word is not required.

Decided that after using DTC, you have two options: Advanced Definition Words Use JMP in its code domain or call? The factor for decisions is how we can get the back parameter domain addresses faster. Let us pay attention to a colon defined Enter encoding:

If you use JSR (Call):

JSR ENTER (8)

...

ENTER:

PULS W (7) Get the address after JSR to W

PSHS IP (7) Save Old IP to Return Stack

TFR W, IP (6) Parameter Address -> IP

NEXT (9) JMP [, Y ] assembly language intelligence

Total 37 cycles

If you use JMP:

JMP ENTER (4)

...

ENTER:

PSHS IP (7) The old IP to return to the stack

LDX -2, IP (6) Recently the code domain address

LEAY 3, X (5) plus 3 deposit into the IP (Y) register

NEXT (9)

Total 31 cycles

Because the MC6809's addressing mode allows the additional level of indirect, the NEXT of 6809 does not use the W register. Enter's JMP version must read the address of the code domain again - Next does not leave this address in any register. The JSR can get the parameter domain address directly by the pop-up return stack. Therefore, the JMP version is faster.

No matter which way, exit is the same:

EXIT:

PULS IP pops "saved" ip from the return stack

Next continues Forth explanation

Some registers have not been assigned. You can put the user pointer in the memory, so Forth is also very good. However, the DP register is wasted, and the DP does not have any other use. Let's use a "skill" to achieve, we move the UP's high to DP (it's low byte is 0).

There is also a register D register that is not used, many of the arithmetic operations require this register. It should be freely as a register that can be used free? Still should be used as a top element? The MC6809 uses a memory as an operand, so no second work register is required. If you temporarily need registers, it is also easy to press D and pop-up. So we can only write test programs in two ways to see which faster.

NEXT, ENTER, and EXIT do not use the stack, the code in various situations is the same.

Dovar, Docon and Lit are the same as the number of clock cycles used in both cases. This explains what we have talked before turning TOS in the register only changed the position of Push or POP:

SWAP, ROT, 0 =, @, especially , by putting TOS to the register:

However,! And ! Slow due to TOS to the register:

These words are slower because many of the Forth word hopes that the memory is hoped in the stack, so it takes an additional FTR instruction. This is why the TOS register must be an address register. Unfortunately, the address registers of MC6809 are used for more important W, IP, PSP, and RSP. However, put the TOS in the register for! And! Losses can be compensated by many arithmetic and stack operations.

Case study 2: 8051

If the MC6809 is the dream of the Forth system implementation, the Intel 8051 is simply the nightmare of Forth achieved. It has only one universal address register, a addressing mode, always uses an 8-bit accumulator.

All arithmetic operations, many logical operations must use accumulators. A unique 16-bit operation is Inc DPTR. The hardware stack must use 128-bytes of on-chip register files, such a CPU is simply a bunch of broken copper straw!

Some 8051 Forth achieve a 16-bit Forth, but they are too slow to meet our requirements. Let us do some weighing to produce a faster 8051 Forth system.

Our initial idea is to use the unique address register. So we use the 8051 program counter as an IP - that is, we construct a subroutine string Forth system. If the compiler uses 2 bytes of ACALL in all possible sites instead of 3 bytes of LCALL, most STC code will be as small as ITC / STC.

The subroutine string means that the returning stack pointer is the hardware stack pointer. The on-chip register file has 64 unit space, but these spaces are not enough to support multi-tasking stacks. In this case, you can consider the following strategies:

• Restrict this Forth system as a single task system;

• Save the return address to an external RAM software stack at all FortH definition portions;

• Save all of the returned stacks in the external RAM when the task is switched.

The second method is the slowest! Move 128 bytes when you switch between each task than moving two bytes in each Forth word. Now I choose 1, and the selection 3 will be expanded later.

The only real address register DPTR will take a variety of mission. It is W, multi-purpose working registers.

In fact, there are two registers to address external memory: R0 and R1. They only provide an 8-bit address, and the high 8 bits will explicitly output to port 2. But for the stack, this is a restriction that can be tolerated because we can limit the stack to 256 byte space. So we use R0 as a PSP.

The same 256 bytes can be used in the user data area, which makes P2 (port 2) a high byte of the user pointer, like MC6809, and low byte hidden is 0. The programmer model of 8051 is turned into. :

0 R0 PSP low byte

1 r1

2 r2

3 R3

4 R4

5 r5

6 r6

7 r7

8-7FH 120 bytes return stack

81H SP RSP low byte (high byte = 0)

82-83H DPTR W Register

High byte of A0H P2 UP and PSP

E0H a

F0H B

Note We only use Bank0, and the additional 3 registers Bank range from 08h to 1FH, from 20H to 2FH bit address registers are not used by Forth. Using BANK0 can get the maximum continuous space for the return stack. The return stack can also be reduced if needed.

In the Forth system in the subroutine string, there is no need for NEXT, ENTER, and EXIT.

How to deal with the top elements of the stack? In 8051, there are many registers, while memory operation is very expensive. Let's put TOS in R3: R2 (press Intel format, R3 is high byte). Note that we cannot use B: A register pair - A register is a funnel, and all register references are done.

The 8051 uses the "Harvard" architecture: procedures and data stored in separate memory. (Z8 and TMS320 are another two examples of Harvard architecture). However, 8051 uses a "barbarism" degradation form: software has no way to write from physical to program memory, which means that Forth's developers can only use the following two ways:

• Cross-compilation All programs, including applications, and abandon the efforts of an 8051 interactive Forth;

• Make some or all of the program memory in data space, the easiest way is to make these two spaces override.

Compared to Z8 and TMS320, it is more civilized, which allows to write to program memory. The specific implementation of the Forth core will be discussed later.

Case study 3: Z80

Selecting Z80 is because it is an extreme example of a non-orthogonal CPU. It has four different types of address registers, some operations use register A as a destination register, some can be arbitrary 8-bit registers, some are HL registers, Some can be arbitrary 16-bit registers, and so on. Some operations (such as ex de, HL) only allow a register combination.

In the CPU of the Z80 (or in the same in 8086), the specification of the FORTH function must carefully match the capabilities of the CPU register. Many scenarios need to assess, and the only way is often a test of different decision programs for testing. In order to avoid this article becomes a "code list", I chose a register based on many Z80 encoding experience, which indicates that these options can reasonably explain the general principles of early discussions.

I hope to get a traditional Forth, although I use direct stripline technology. I need all "Classic" virtual registers.

Ignore other register sets, the six address registers of Z80 have the following capabilities:

• BC, DE - LD A indirect, Inc, DEC also exchange DE / HL

• HL - LD R Indirect, ALU Indirect, Inc, DEC, ADD, ADC, SBC, Switching W / TOS, JP Indirect

• IX, IY - LD R Indirect, ALU Indirect, Inc, DEC, ADD, ADC, SBC, Switching W / TOS, JP Indirect (All Slow) • SP - PUSH / POP 16-bit, Add / ADC / SUB TO HL / IX / IY

BC, DE, and HL can also be processed as a bit register.

8-bit register A must leave a temporary register because many ALU operations and memory references use it as a purpose.

HL is undoubtedly the most universal register, which can be tried to use it as each virtual register one by one. However, due to its versatility - it is unique to read the word format and support indirect jumps, the host, which should be used as a general working register W for Forth.

Since IX, IY has index addressing mode and can be operated with ALU, so it can be considered as a stack pointer. But they have two main issues: the general stack pointer SP register is useless, but IX / IY is very slow!

There are many 16-bit PUSH / POP class operations on both stacks of Forth. For SP, these operations require only one instruction, while IX or IY operations require 4 instructions. So one of the two stacks should be implemented with SP, which should be a parameter stack because it is high than the return stack is higher than the return stack.

How to consider Forth's IP register? In most cases, IP is read from the memory and automatically increment. Using IX / IY as IP will not have any programming advantage than using BC / DE, consider the speed of IP, use BC / DE but Faster. Let us put the IP in DE: it can exchange with HL content, while the latter is general.

A second Z80 register is required to perform 16 arithmetic operations for (not W). Now there is only BC, it can be used to address or perform ALU operations with A. However, we use BC as the second working register "X", or as a top element? Only encoding can be conclusively. Now let us optimize BC = TOS.

There is only RSP and UP, and there is no assignment of IX and IY registers. IX and IY are equivalent, we set IX = RSP, IY = Up.

Thus, the register of the Z80 Forth system is allocated as follows.

BC = TOS IX = RSP

DE = IP IY = UP

HL = W sp = PSP

Let us now look at the DTC's Forth System next code:

DTC-Next:

LD A, (DE) (7) (ip) -> W, incremental IP

LD L, A (4)

INC DE (6)

LD A, (DE) (7)

LD H, A (4)

INC DE (6)

JP (HL) (4) jumps to the address of W

You can also have other versions (with the same clock cycle)

DTC-Next:

EX DE, HL (4) (IP) -> W, incremental IP

Next-hl:

LD E, (HL) (7)

INC HL (6)

LD D, (HL) (7)

INC HL (6)

EX DE, HL (4)

JP (HL) (4) Go to W in the address

Note The unit is stored in a memory in a low byte priority. Similarly, although it seems to be saved in the HL register, there is actually not. This is because Z80 cannot perform JP (DE). Next-HL will be shorter.

For comparison, let's take a look at ITC next. The previous pseudo code requires another temporary register "X", which is used to indirect jump. Let DE = X, BC = IP, TOS is saved in the memory. ITC-Next:

LD A, (BC) (7) (IP) -> W, incremental IP

LD L, A (4)

INC BC (6)

LD A, (BC) (7)

LD H, A (4)

INC BC (6)

LD E, (HL) (7) (W) -> X

INC HL (6)

LD D, (HL) (7)

EX DE, HL (4) jumps to the address in x

JP (HL) (4)

This adds "W" plus 1 and put it in the DE register. As long as this is consistent, there will be no problem - how to find it when you need to write, and how to adjust it.

The ITC's NEXT is 11 episodes, and DTC is 7 inherent times. ITC does not save TOS to the server in the register, so I selection DTC.

If you use embedded encoding, DTC Next requires 7 bytes in each Code word. A subroutine that directly jumps to Next requires only 3 bytes, but 10 clock cycles are needed. This is a special example, we choose the NEXT of the inline mode. But sometimes NEXT is particularly large, or the memory is small, more cautious decisions may be using JMP to Next.

Let us now look at the ENTER code. Using a Call, you can pop up the hardware stack to get the parameter domain address:

Call Enter (17)

...

ENTER:

Dec ix (10) put the old IP to the stack

LD (IX 0), D (19)

Dec ix (10)

LD (IX 0), E (19)

POP DE (10) Parameter Address -> IP

NEXT (38) 7 Machine Directive assembly language macro

In fact, this is fast than POP HL, however, using the last six instructions (without Exde, HL):

Call Enter (17)

...

ENTER:

Dec ix (10) put the old IP to the stack

LD (IX 0), D (19)

Dec ix (10)

LD (IX 0), E (19)

POP HL (10) Parameter Address -> HL

Next-HL (34) looks at the above DTC NEXT code

Total 119 cycles

When JP is used, the W register (HL) still points to the code domain. The code domain is the three bytes thereafter:

JP ENTER (10)

...

ENTER:

Dec ix (10) put the old IP to the backrest LD (IX 0), D (19)

Dec ix (10)

LD (IX 0), E (19)

Inc HL (6) Parameter Address -> IP

INC HL (6)

NEXT-HL (34)

Total 120 cycles

Since the entry of NEXT is changed, the new value of IP does not have to be placed in the DE register.

The Call version is a period of 1 cycle. When the embedded system is applied to Z80, we can also use single-byte RST instructions to get the dual revenue of speed and space, but on the Z80-based personal computer, this policy is not available (the operating system uses this feature, " The system call of the operating system is entered through this interface).

Case Study 4: Intel 8086

Intel's 8086 is another CPU with educational significance. We no longer discuss the design process in detail, just watch a new shared software for PC: Pygmy Forth [Ser90] .pygmy is a direct string Forth system, and the top element is saved in the register. 8086 register is arranged like this:

AX = W Di = Scratch

BX = TOS Si = IP

Cx = scratch bp = rsp

Dx = scratch sp = psp

Many 8086 Forth System Implementation uses Si registers as IP, so NEXT can be implemented via the LodSW instruction. NEXT is like this in the DTC implementation of Pygmy:

Lodsw

JMP AX

This has been small enough to embed it in every Code word.

Advanced "Definition" Forth word uses a JMP (relative) instruction to the machine code. ENTER subroutine (called 'Docol' in Pygmy) is therefore required to obtain parameter domain addresses from W.

ENTER:

XCHG SP, BP

Push Si

XCHG SP, BP

Add Ax, 3 Parameter Domain Address -> IP

Mov Si, AX

Note that the XCHG usage of two stack pointers is swapped, which allows both PUSH and POP instructions to both stacks, which is fast than using BP-based direct addressing instructions.

EXIT:

XCHG SP, BP

POP Si

XCHG SP, BP

Segment model

Pygmy Forth is a single-segment Forth system, all of the code and data are in a 64K byte segment, which is equivalent to Turbo C's tightening mode. So far, the Forth Standard we discussed all things all contain all things in a single memory address space, using the same read and write operator. However, IMP PC Forth starts using multiple segments to process 5 different data, they are:

CODE ... machine code

List ... Advanced Forth Series (so this section is also called Threads)

Head ... the first part of the word word

Stack ... Parameters and Back Stack

Data ... variables and user definition data

This allows the Forth of the Forth to break through the 64k bytes of paragraphs, but no need to implement a 32-bit Forth system on a 16-bit CPU. However, achieving a multi-segment model, branching to Forth core, etc., has far exceeded the scope of this article.

references

[KEL92] Kelly, Guy M., "Forth Systems Comparisons," Forth Dimensions XIII:. 6 (Mar / Apr 1992) Also published in the 1991 FORML Conference Proceedings Both available from the Forth Interest Group, PO Box 2154, Oakland,. Ca 94621. Illustrate Design TradeOffs of Many 8086 Forths with code fragments and benchmarks - highly recomment!

[MOT83] Motorola Inc., 8-Bit Microprocessor and Peripheral Data, Motorola Data Book (1983).

[SIG92] SIGNETICS Inc., 80c51-based 8-Bit MicroControllers, Signetics Data Book (1992) .forth Implementation

[Sey89] SeyWerd, H., Elehew, WR, And Caven, P., Love-83Forth for the IBM PC, Version 1.20 (1989). A Shareware Forth Using A Five-Segment Model. Contact Seywerd Associates, 265 Scarboro Cres. , Scarborough, Ontario M1M 2J7 Canada.

Part III Decryption DOES>

correct

There is a big mistake in the last part of the MC6809 design decision, which is very obvious when I encode the Forth word execute.

Execute causes the execution of a Forth word, its address on the parameter stack. More precisely: Compilation Address, or code domain addresses are given on the parameter stack. This can be any type of Forth word: Code definition, colon definition, constant, varible, or definition word. Unlike the usual Forth interpretation process, the address of the execution word is given on the stack, not by "string" (specified by IP).

This can be easily encoded in our direct string MC6809:

EXECUTE:

TFR TOS, w in W in W

Pulu TOS pops up new TOS

JMP, W jumps to the address given

Note: It should be JMP, W instead of jmp [, w], because we already have this word address, not read from the advanced thread. If TOS is not in the register, Execute can easier JMP [, PSP ] easier. Now suppose this of this executed is a colon definition, W will point to its code domain, which contains JMP ENTER. As follows:

JMP ENTER

...

ENTER:

PSHS IP

LDX -2, IP re-acquired code domain address

LEAY 3, X

This is wrong! Because we are not executing this word from the string, IP does not point to a copy of the code domain address. Remember: The address of the Execute is from the stack. ENTER in this way cannot work with Execute, because there is no way to get the address of the word to be executed.

This also proposes a new rule of DTC Forth: If next does not put the address of the word to be executed, you must use Call in the code domain. Thus, MC6809 Forth has only returned to the method of using JSR in the code domain. However, Enter is the most code snippet in Forth, in order to avoid the loss of speed, I have completed the "student exercises" in the previous chapter. Note What happened when you exchange RSP and PSP:

It takes 31 cycles to perform a new version, which is the same as the JMP version I used in front. The improvement is due to the JSR version of ENTER uses Forth's return stack and the MC6809 subroutine returns (JSR stack). Using two different stack pointers means that we don't have to exchange "TOS" TOS with IP "TOS, you don't need any temporary registers.

This also explains a new Forth core usual development process: first make some design decisions, then write some simple code, find a bug or a better way to do this, change certain design strategies, Re-write the sample code, repeat this process until satisfaction.

This gives us a lesson: makes execute as a benchmark word.

Carey Bloodworth of Van Buren, AR pointed out a small in the previous version of MC6809, but I am embarrassed to say:

For 0 = "TOS in the Memory" version, I should write code like this:

LDD, PSP

CMPD # 0

This is to test whether TOS is 0. However, in this case, the CMPD instruction is completely redundant because the LDD instruction will set the Zero flag when the D register is 0. TOS is still in the D register or the CMPD instruction is required, but the speed is faster than the TOS in the memory version.

Let us start discussing the subject

What is a code domain?

The concept of Does looks the most difficult to understand and the most mysterious part in Forth, but Does also makes Forth have a powerful reason - in many ways, it is an object-oriented. Does' behavior and ability have also gained contact with the flashes of Forth: Code domain.

Recall the first part, the definition of Forth consists of two parts: code domain and parameter domain. You can visit these two domains from different aspects:

• The code domain is the action of this Forth word, and the parameter domain is data related to the action;

• Code domain is a subroutine call, the parameter domain is the "embedded" parameter after calling (assembly programmer point);

• The code domain is a single "method" of the word class, and the parameter domain is a special word "instance variable" (viewpoint for object-oriented programmers);

All these points have a common point:

• The code domain subroutine has at least one parameter in the call, which is the parameter domain address of the Forth word to be executed, and the parameter domain can contain any number of parameters;

• There are only a few relatively few special actions, or that the code domain only references several special subroutines (we will see later, this for Code exceptions). We can recall the ENTER subroutine of Part 2: This general subroutine is defined by all Forth colon;

• Interpretation of the parameter domain is implicitly explained by the content of the code domain. Or, each code domain subroutine wants the parameter domain to contain a certain type of data;

A typical FortH core has the following predefined code domain subroutines.

The power of Forth is that the Forth program is not limited to only use of these code domain subroutions (or can only use other sub-assemblies provided by your Forth system kernel). Programmers can define a new code domain subroutine that can define a new parameter domain type matching. To create a new "class" and "method" with the "Book" for object-oriented programming methods (although there is only one method for each class). At the same time, just like other Forth words - code domains can be defined by assembly language or high-grade Forth word. To understand how the mechanism and parameters of the code domain are passed, we first look at the situation of assembly language (machine code). Let's take an indirect strip (ITC), it is the easiest to understand, then look at how these logic to direct string (DTC) and subroutine string (STC). Finally, look at how to use advanced Forth definition to describe the action of the code domain.

Forth's writer is somewhat chaos when using the term, so I use my own terminology to explain, as shown in Figure 1. The first part contains dictionary information, there is no relationship with a FortH word. The body is the "work" section of this word, containing a parameter domain with a fixed length code domain and variable length. For any given word, the positions of the two domains are called code domain addresses (CFAs) and parameter domain addresses (PFA) in the locations in the memory. The code domain address of a word is the location of this word in the memory. Do not confuse this with the contents of the code domain, in ITC, the content is another different address.

It should be clear that the content of the code domain is the address of another memory, in which the machine code is in that memory. I call this address as the code address. Finally, when DTC and STC Forth are discussed, I also refer to "code domain content", and its meaning is more than code domain addresses.

Figure 1 An ITC Forth word

Machine code action

Forth's Constant may be the easiest machine code example. Let us examine an example of a French:

1 Constant UN

2 Constant Deux

3 Constant Trois

Executing UN will press the value 1 into the stack, perform DEUX 2 pressing the stack, and so on. (Do not confuse the parameters and parameter domains, they are completely independent)

One word in the Forth kernel is called constant. This is not a constant class of the word itself, it is a high-level Forth definition. Constant is a "definition word": It creates a new word in the Forth Dictionary, through which we can create new "constant class" words UN, DEUX, and TROIS. You can also understand them into a "instance" of the constant "class". These three words have their own code domain, all pointing to the machine code snippet of the same COSntant action.

What action should this code snippet do? Figure 2 shows a memory representation of these three constants. All three words point to a common action subroutine. The difference between these words is their parameter domain, which simply contains a constant value, or uses object-oriented statements "instance variables." Therefore, the action of these three words should be the content of the parameter domain and put them on the top of the stack. This code also implies that the parameter domain contains a value of a unit size.

Figure 2 Three constants

In order to write the machine code segment of this matter, we need to know how to find the address of the parameter domain, and the Forth's interpreter can jump to the machine code. So how is the PFA pass to the machine code subroutine? And how is the NEXT of the Forth interpreter encoded? This depends on different implementations. In order to write machine code actions, we first need to understand NEXT.

ITC's NEXT has been described in the first part. The following is the implementation of MC6809, use Y = IP, x = W: Next: LDX, Y ; (IP) -> W, IP 2 -> IP

JMP [, x]; (w) -> TEMP, JMP (TEMP)

Suppose this code has such code in our advanced string:

... Swap Deux ...

When NEXT is executed, use the IP Interpretation Pointer to point to DEUX "instructions" (after SWAP), Figure 3 explains what happens. IP (Register Y) points to a memory unit inside the advanced string, which contains the address of the Forth word DEUX. More precisely, this unit contains a code domain address with word DEUX. So, when we use Y to read a unit, automatic increment Y, we get the code domain address of DEUX. Write it into W (Register X), and now you have point to the code domain, the address of a machine code snippet. We can read the contents of this unit and then use a MC6809 instruction to jump to the corresponding machine code. This process does not change the register x, so W is still pointing to the CFA of DEUX, we can get the parameter domain address, which is at the position of the two bytes after the code domain.

Figure 3 ITC is before and after NEXT

Therefore, the machine code snippet only needs to put the W plus 2, read the unit content of this address, press it onto the stack. This code snippet is often referred to as DOCON

Docon:

LDD 2, X; read the units at W 2

PSHU D; put it in the parameter stack is

Next; (macro) jump to the next high-level word

In this example, TOS is in the memory. Note that the previous next has increased IP 2, so when DOCON is NEXT, it has points to the next unit (" " CFA of the string).

Typically, ITC Forth will leave a parameter domain address or some "neighbor" address in the W register. In this case, W contains CFA, which is always PFA - 2 in this Forth implementation. Since the parameter domain addresses need to be used in addition to Code, many NEXT implementations are increment W to point to PFA. We can do some small changes on the MC6809:

LDX, Y ; (IP) -> W, IP 2 -> IP

JMP [, X ]; (W) -> Temp, JMP (TEMP), W 2 -> W

This makes NEXT add 3 cycles, but puts the parameter domain address in the W register. What did it do for code domain subroutions?

W = CFA W = PFA

Docon:

LDD 2, X (6) LDD, X (5)

PShu D pshu d

Next Next NEXT

Dovar:

Leax 2, x (5); no operation

Pshu x pshu x

Next Next NEXT

ENTER:

PSHS Y PSHS Y

LEAY 2, X (5) LEAY, X (4, Bi TFR X, Y)

Next Next NEXT

What kind of revenue we got from the price of 3 cycles from NEXT? Docon decreased by 1 cycle, Dovar reduced 5 cycles, and ENTER reduced 1 cycle. The Code word does not use the value in W, so they do not benefit from automatic increments. The increase in speed or loss of the FORTH word is taken through the mixing of the Forth word. The usual rule is to perform the most words of the Code, which, increment W will have a little speed loss in NEXT - of course saving the memory - but Docon, Dovar and Enter only have once, resulting earnings It is not obvious. Say, the best conclusion is depends on the specific processor. For example, a processor like Z80 can only pass the memory by byte, it does not have an automatic increment command, so it is generally, it is best to keep W point to IP 1 (the last byte read from the code domain). In some machines, automatic increment is "free", then let W point to the parameter domain is the most convenient.

Note: Decisions in a system must be consistent. If NEXT makes W point to PFA at execution, Execute must also do this (this is why I started a desperate reason why I started in this article).

Direct string

The direct string and indirect string are similar, except for the content field: it is no longer the address of some machine code, but JUMP or CALL. Doing so may make the code domain larger - such as 1 byte in the MC6809, but it saves the first level in the NEXT subroutine.

Selecting JUMP in the code domain or the CALL instruction depends on how the machine codon gets the parameter domain address. In order to jump to the code domain, many CPUs require it to put its address in a register. For example, the indirect jump instruction of Intel 8086 is JMP AX (or other register), which is JP (HL or IX or IY) on Z80. On these processors, the NEXT of the DTC includes two operations, which will become in the MC6809:

LDX, Y ; (IP) -> W, IP 2 -> IP

JMP, X; JMP (W)

On Intel 8086, these two instructions can be LODSW and JMP AX, which can be explained by Case1 of Figure 4. DEUX's code domain address is read from the advanced string, IP is incremented. Then, the reading operation is no longer, but jumps to the code domain with a JUMP instruction. That is, the CPU jumps directly to the code domain. The CFA is left in the W register, just like the first example of the above ITC. Since this address is already in the register, we can simply put the JUMP in the DOCON code domain, and the DOCON's code snippet works as described above.

Figure 4 Before and after NEXT in DTC

However, we may notice that on some processors, such as MC6809 and PDP-11, you can implement this DTC Next

JMP [, y ]; (ip) -> TEMP, IP 2 -> IP, JMP (TEMP)

This also enables the CPU to jump to the code domain of DEUX. But there is a huge difference: there is no CFA in any register! So how do the machine code snippet get the address of the parameter domain? The answer is: replacing the JUMP by using the Call (or JSR) instruction. On many CPUs, the CALL instruction places the return address on the backrest - this is the address following the CALL instruction. As the case2 shown in Figure 4, this address is the parameter domain address we need! So DOCON is to get the address from the return stack - to meet the requirements of the code field - and then use this address to read constants, then:

Docon:

PULS X; pop up PFA from return stack

LDD, X; read the unit of the parameter domain

PSHU D; pressing parameter stack

Next; (macro) go to the next high-level word

Compare this with ITC version. DOCON has more instructions, but NEXT has 1 instruction. Dovar and next also have more instructions:

Dovar:

PULS X; pop-up PFA

PSHU X; put the address on the parameter stack

ENTER:

PULS X; pop-up PFA

PSHS Y; pressing old IP

TFR X, Y; PFA has become a new IP

Now return to this article, reread my "correction", see why we can't reread CFA through IP. At the same time, it should also be noted that the situation of the FORTH's stack pointer to the MC6809's U register S is different from the discussion here.

Subroutine string

Subprogramcles (STC) and DTC are very similar, all CPUs jump directly to the code domain of the Forth word. But now there is no more NEXT code, no longer have IP registers, and there is no W register. So, you can only use JSR in the code domain without any other choice, which is the only way to get the parameter domain address. This process is shown in Figure 5.

Figure 5 String code of STC

The advanced string is a series of subroutines that are executed by the CPU. When a JSR DEUX is executed, the address of the next instruction in the string is advanced returning. Next, JSR DOCON in word deux is executed, which makes another return address - Deux's PFA is pushed into the stack. DOCON can pop up this address, use it to read constants, save the constant on the stack, then return to the strip with a RTS directive:

Docon:

PULS X; pop up PFA from return stack

LDD, X; read parameter domain unit

PSHU D; press it into the parameter stack

RTS; execute the next high level

In the subroutine string code, we can still use terms such as code fields and parameter domains. In addition to each Forth word other than Code and colon definitions, the code domain is the space occupied by JSR or Call (like DTC), and the parameter domain is the space behind it. Therefore, on the MC6809, PFA is equal to CFA 3. Thus, the "parameter domain" defined by the Code and colon becomes a bit blurred, and this can be seen later.

Special case: CODE word

In all the above general discussions, there is a clear exception, which is the Code definition - the Forth word defined by the contract subroutine. Use "assembly language to define a word" - this magical function is easy to implement in Forth because each Forth word executes a FORTH code.

Compilation code containing the Code word is always included in the body of a FortH word, the code domain must contain the address of the machine code to be executed. Therefore, the machine code is placed in the parameter domain, and the code domain contains the address of the parameter domain, as shown in Figure 6. Figure 6 CODE word

In Forth, directly or subroutine string, we can push it through class, put a jump in the code domain. The code domain can also be filled with NOP or the same result. More preferably, the machine code can start directly from the code domain and then enter the parameter domain. From this point, there is no difference between code fields and parameter domains. This should not have any questions, because we don't need to distinguish between a Code word. But there may be some contusons and some smart programming techniques that require this distinction, we don't discuss them here.

Code word - no matter how it is implemented - is not needed to pass the machine code action of the parameter domain address. The parameter domain does not contain data, just the code that needs to be executed. Only Next needs to know this address (or code domain address) so that it can jump directly to the machine code.

Use; code

There are still three problems now have no answer:

• How do we create a Forth word so that you can include any any data in its parameter domain?

• How do we change the code domain of a word to point to the selectable machine code?

• How do we compile (assembly) this code snippet in the case where the code snippet is isolated using it.

The answer to the first question is: Write a Forth word to do this. At the time of execution, because this word will define a new word in the Forth Dictionary, it is called a "definition word".

Constant is a definition word. All "hard work" of a defined word is done by a kernel word crete, which analyzes the name from the input stream, establishes the head and code domain for the new word, and links it to the dictionary. For programmers, the rest of the work is to construct a parameter domain.

The second, the answer to the third question is included in two puzzled Forth words, respectively (; code), and; Code. To understand how they work, let's take a look at how the definition word constant actually writes in the FORTH advanced definition. Use the previous MC6809 example:

: Constant (n -)

Create / Create a new word

, Write TOS's value into the dictionary, as the first unit of the parameter domain

Code / End Advanced Definition, start assembly code

LDD 2, X / DOCON assembly code snippet

PShu D

END-CODE

This Forth word contains two parts: from constant to; anything of Code is the advanced Forth code executed when COSntant is accessed. And from; Code to End-Code is all "children" - constant types such as UN and DEUX - the machine code to be executed when executed. In fact, the word snippet of Code to End-Code is the machine code segment that will point to the constant class. ; Code represents an end (;) and a Code defined by a machine code. However, it does not build two separate words in the dictionary, and all content from constant to end-code is saved in the parameter domain of Constant, as shown in Figure 7.

Figure 7 ITC; CODE

Derick and Baker [DER82] use three "time phases" to help understand the behavior of the defined word:

Time stage 1

It is a behavior when Constant is defined. This requires a high-level compiler (for the first part) and the FORTH assembler (for the second part). This is the process of defining the constant to be added to the dictionary, as shown in Figure 7. We can see that this compilation indicator is performed at the first phase. Time stage 2

It is the behavior when the word constant is executed, and some constant types are defined, such as:

2 Constant Deux

This stage is when the word constant is executed, when the word DEUX is added to the dictionary. At this stage, the advanced definition part of Constant is executed, including the word (; code).

Time stage 3

It is the behavior of constant class execution. In our example, this stage is when DEUX is executed and pushes the value 2 into the stack. At this time, Constant's machine code is executed (recalls DEUX code domain action)

Words; Code and (; Code)

Code is executed at the time phase 1, which is when constant is compiled. It is a Forth Immediate Word - IMMEDITE word - this word is executed when Forth compiles.

CODE does three things:

• Compile the Forth word (; code) to constant

• It turns off the Forth compiler, at the same time

• It turns on the Forth assembler

(; CODE) is part of the word constant, which is executed when constant is executed (time phase 2), which does the following action:

• It gets the address of the machine code followed, which can be achieved by popping IP from the FORTH returns;

• It places this address to the code domain of the word defined by CREATE, and waits the address of this word through the Forth word Last (sometimes referred to as Latest);

• It completes Exit (also known as; s), so that Forth's internal interpreter does not execute the rear code as the FORTH string, this is the advanced "subroutine return" ending the FORTH string.

F83 [LAX84] explains the typical code in the Forth system:

:;;

Compile (; code) / compile (; code) to definition

? CSP [Compile] [/ Close Forth compiler

Reveal / (";" behavior class)

AskEMBLER / open assembler

Immediate / put this word as immediate

: (;; Code)

> / Pop-up machine code address

Last @ Name> / get the last word CA

! / Save this code address to the code domain

; /

The word is more subtle in both words. Because it is a high-level Forth definition, in the constant, the address-advanced return address is pressed into the FORTH returns, so the returning stack can get the machine code address after the return stack in (; Code). At the same time, this value pops up from the return stack so that the first-level advanced subroutine is returned "bypass", so it can be refunded to the caller of Constant when (; Code) exits. This is equivalent to returning to cosntant and makes constant return immediately. The execution of Figure 7 and tracking words constant and (; code) can be clearer to see how it works.

Direct and subroutine string

For DTC and STC, the actions of Code and (; Code) are the same as ITC, but there is an important exception: it no longer saves an address, and a JUMP or CALL instruction is placed in the code domain. For an absolute JUMP or CALL, the only thing that may be done is to save the address at the end of the code domain, as the operand of the JUMP or CALL instruction. In the case of MC6809, the address is saved as the last two bytes of the 3-byte JSR instruction. However, some Forth systems such as Pygmy Forth of Intel 8086, they use relative transfer instructions in the code domain. In this case, the relative offset must be calculated and inserted into the branch instruction. Advanced Forth Behavior

You have seen how to make the Forth word execute a specified assembly language code snippet, how to deliver the parameter domain address to this segment, but how do we use advanced Forth definition "Write" subroutine behavior?

Each Forth word must be - through the NEXT behavior - perform some machine language subroutines. This is all of the code domain. Therefore, a machine subroutine, or a series of subroutines require a problem that accesses high-level behavior. We call this subroutine for dodoes.

There are three problems here to solve:

• How do we find the address of the advanced behavior subroutine associated with this word?

• How do we access the Forth interpreter from the machine code to call a advanced behavior subroutine?

• How do we pass the parameter domain address we are performing by the subroutine?

The answer to the third question is: Easy, use our advanced Forth subroutine to pass parameters on the parameter stack. Our machine language subscriber must push the parameter domain address to the stack before accessing the advanced string (from our previous work, we know how the machine language can get PFA)

The answer to the second question has a little difficult. Basically we can do something like Forth word execute to access a Forth word; or may be Enter, which access a colon definition. They are all our "key" core words, and DODOES is similar.

The first problem seems to have some difficulty. Where do we put the address of the advanced subroutine? Remember: The code domain does not point to the advanced code, which must point to the machine code. In the history of Forth, two methods have been used.

Fig-Forth Solution

Fig-forth uses the first unit of the parameter domain to save the address of the advanced code. The DODOES subroutine gave the address of the parameter domain through this unit, and pushed the address of the actual data (typically PFA 2) to the stack, acquired the address of the advanced subroutine, and then called Execute. This method has two problems:

First, the structure of the parameter domain is different from the machine code behavior and advanced code behavior. For example, a Constant using machine code can save its code to PFA, but a Constant behavior using advanced definitions must save its data in (typically) PFA 2.

Second, the instance of each advanced behavior class has added a unit overhead. That is, if constant is used for a senior behavior, each constant in the program has to increase a unit! Fortunately, smart Forth programmakers quickly found a way to solve this problem, and the Fig-forth method is no longer used.

Modern solution

Most Forth programmers have configured a different machine language code snippet for each advanced behavior subroutine. As a result, a senior constant will have its own code domain, which point to a machine language clip, its core function is to access the advanced behavior of Constant; a high-level variable code domain will point to a "Startup" subroutine to achieve advanced variable Behavior, etc. Does this approach cause a lot of credits? Will not. Because these machine language is just a call to the usual promoter DODOES (unlike the forth subroutine), the address of DODOES advanced code is passed as a "embedded" subroutine parameter. This means that the address of the advanced code is placed behind the JSR / CALL instruction. DODOES can pop up from the CPU stack and then get this address by reading.

In fact, we can do it easier. The advanced code itself is placed after the JSR / CALL directive, and the DODOES pops up the CPU stack and directly gets this address. Because we know this is a high-end Forth code, we can ignore the code domain, and only compile the advanced string ... This is convenient to integrate Enter's behavior into dodoes.

Every "definition" is now pointing to a small part of the machine code - there is no waste of any parameter domain. This small part of the machine code is a JSR or a CALL instruction, follows a senior behavior subroutine. In the example of MC6809, we have enabled two bytes of each constant with a 3-byte JSR, which only occurs once.

Use these strategies to make it contain many of the puzzle logic in the Forth core. So let us use our trustworthy ITC MC6809 example to see how this is actually implemented:

Figure 8 shows a DEUX constant implemented using advanced definitions. When the Forth interpreter encounters DEUX - that is, when the Forth's IP register is in IP (1) - it does something: it reads the address included in the DEUX code domain, jump to that address. On that address is a JSR DODOES directive, so the second jump immediately - this time is a subroutine call.

Figure 8 ITC DODOES

DODOES then must perform the following action:

• Push DEUX's parameter domain address to the parameter stack for future advanced behavior subroutines. Because the JSR directive does not change any registers, we hope that DEUX's parameter domain addresses (or "neighboring" address) remain in the W registers;

• Address that pops up the CPU stack to get the address of the advanced behavioral subroutine (Memories: Pophang CPU stack can get any address following the JSR instruction). This is a high-level string, a parameter domain of a colon defined;

• Save the old Forth instruction pointer - IP (2) - to Forth Return Stack because the IP register is to be used to perform advanced code. Essentially, DODOES must "nested" IP, just like Enter. Remember the return stack of Forth may differ from the subroutine stack of the CPU;

• Place the address of the advanced string in IP, which is IP (3) in Figure 8;

• Perform Next in a new location to continue the advanced explanation;

Suppose a indirect line ITC MC6809 meets the following:

• W is not incremented by NEXT (that is, W will contain a NEXT into the CFA)

• The S register of the MC6809 is Forth's PSP, the U register is Forth's RSP (that is, the stack of the CPU is not Forth return stack)

• The Y register of the MC6809 is Forth's IP, X is the Forth's W Memory in these conditions, NEXT definitions:

LDX, Y ; (IP) -> W, AND ip 2 -> ip

JMP [, x]; (w) -> TEMP, JMP (TEMP)

DODOES can be written like this:

DODOES:

Leax 2, X; make W point to parameter domain

Pshu Y; pressing the old IP into the stack

PULS Y; bomb new IP from the CPU stack

PSHS X; pressing parameter domain address w to parameter stack

NEXT; Access Advanced Interpreter

These operations are not in strict in order. Of course, as long as the appropriate data enters the appropriate stack (or enters the right register), the order in which the operation is not tight. Here, we actually use this fact: before the new IP pops up from the CPU stack, the old IP can be pressed into the return of Forth.

On some processors, the Stack of the CPU is used for the return stack of Forth. For this case, a temporary memory access step is required. Also the same example, if we must choose S = RSP and u = PSP, dodoes will become:

DODOES:

Leax 2, X; Let W point to parameter domain

PSHU X; press the parameter domain address W into the parameter stack

PULS X; pop up the address from the CPU stack

PSHS Y; press the old IP into the stack

TFR X, Y; put the address of the serial line into IP

NEXT; Access Advanced Interpreter

Because we are inherently exchanged IP and return to the stack / CPU stack, we must use X as a temporary register. Thus, we must press the PFA - (a) into the stack before re-use the X register.

We have to study these DODOES examples step by step, track the contents of two stacks and all registers. I often study my own Dodoes subroutines to make sure that any register is not used in the wrong time.

Direct string

DODOES logic is the same in DTC. But my implementation is different, depending on DTC Forth, in the code domain of a word, uses JMP or use Call.

Use JMP in the code domain. If the address to be executed can be obtained in the register, a DTC Forth can use JMP in the code domain, which is very like code domain addresses. From DODOES's point of view, this is the same as ITC.

In our example, DODOES knows that the Forth interpreter jumps to machine code related to DEUX, and that code is JSR to DODOES. Now that each jump is to use direct jump or use indirect jumps, there is no relationship, and the contents of registers and stacks are the same. Therefore, DODOES code is the same as ITC (of course, NEXT is different, and W may have different offset points to the parameter domain).

In the MC6809 of the DTC, we have never explicitly read the CFA to be executed, so the Forth word must contain a JSR in its code domain, so we can get the parameter domain address of this word through the stack. Not getting from the stack. This case is shown in Figure 9.

Figure 9 DTC Dodoes

When IP is in IP (1), the Forth interpreter jumps to the code domain of DEUX (at the same time increment IP). In the code domain, it is a JSR to the DEUX machine code snippet, where is the second JSR, to DODOES. So the two addresses entered the CPU stack.

The return address of the first JSR is the parameter domain address of DEUX, the second JSR return address - the top of the CPU stack-is the advanced string address to be executed. DODOES must ensure that the old IP has been pressed into the return stack, and the PFA of DEUX is pressed into the parameter stack, and the address of the advanced string is loaded into the IP. These are very sensitive to stack allocation! For S = PSP (CPU stack) and U = RSP, NEXT, and DODOES code becomes: Next:

LDX [, y ]; (ip) -> TEMP, IP 2 -> IP, JMP (TEMP)

DODOES:

Pshu Y; pressing the old IP into the stack

PULS Y; split a new IP from the CPU stack. Note: The CPU stack is a parameter stack, the top elements are now the PFA we need.

NEXT; Access Advanced Interpreter

We can look at NEXT, DEUX, and DODOS press the entire process of the PFA in DEUX.

Subroutine string

Figure 10 shows an example of a DEUX advanced behavior of an MC6809 STC. When entering DODOES, the three data is pressed into the return of the CPU / RETURN: "Main Cermination" return address, DEUX's PFA, DEUX's advanced behavior code address. DODOES must pop up the last two, press the PFA into the parameter stack, jump to the behavior code:

Figure 10 STC DODOES

The MC6809's DODOES is now a subroutine of 3 instructions. It can even be further simplified by "turning JSR DODOES into embedded methods". That is to say, use equivalent machine code instead of JSR DODOES. Since a JSR is simplified, the stack is simplified:

PULS X; Popked PFA from the CPU stack

PSHU X; press it into the parameter stack

...; other advanced strings of DEUX

Here, 4 bytes of explicit code replaced 3 bytes of JSR instructions, which quite effectively improved the speed of execution. For MC6809, this is perhaps a good choice. For processors like 8051, Dodeos seems too long, and it should be good as a subroutine.

Use des>

We have already learned it; Code goes to create a Forth word, which can contain any data in the parameter domain, and how to make a word code domain to the new machine code snippet. So how do we compile a senior behavior subroutine and tell it with a new word?

The answer depends on two Forth words does> and (does>), they are; Code and (; Code) advanced definition equivalents. To understand them, let's take a look at the example:

: Constant (n -)

Create / Create a new word

, / Add TOS values to the dictionary as the first unit of the parameter domain

Does> / End "Create Some" Start "Behavior" section

@ / Give PFA to get its content

;

Compare these with the front; Code example, you can see the functions of does> executions; Code is similar. Every behavior from: constant to does> is accessed when the Constant word is executed. This is the parameter domain and code that builds a "definition" word. From DOES> to; the code is the advanced code executed when the "child" (such as DEUX) is accessed, that is, the advanced code snippet to point to the code domain. (We will see that JSR DODOs is included before this advanced code snippet).

Like; CODE, "Create" and "Action" clauses are in the Forth word constant, as shown in Figure 11. Figure 11 ITC's DODOES

Memories Time Series 1, 2, 3, Words> and (DOES>) doing the following example:

• It compiles the Forth word (does>) to constant;

• Compile a JSR DODOES to constant;

Note DOES> Keep the Forth compiler, which can be guaranteed to be compiled by the following advanced code snipples. Similarly, although JSR DODOES itself is not Forth code, it is like Does> to make it compile to the Forth code.

(DOES>) is part of the word constant, so when the Constant is executed (time series 2), it does something wrong:

• It gives an address (JSR DODOES) that bounces IP from the backrest from Forth;

• It puts this address in the code domain that is just defined by the CREATE.

• It performs EXIT behavior such that Constant is interrupted here and no longer executes the subsequent segment.

(DOES>) The behavior and (; code) are the same! So the Forth system does not need to define a new word. For example, the F83 system is simultaneously used in CODE and DOES>. I also start using (; code) now (Does>).

You have seen how you work (; code). F83 is this definition of does>

: Does>

Compile (; code) / compile (; code) to definition

0E8 C, / CALL instruction operation code byte

DODOES here 2 -, write relative transfer to DODOES

Immediate;

Here, Dodoes is a constant that saves the address of the DODOES subroutine (the actual F83 source code is different from this point here, because the Meta compiler used by the F83 has different requirements).

Does> Do not need to change the CSP or SMUDGE bit, because the status of the Forth compiler is 'ON.'. In the case of Intel 8086, the CALL instruction uses a relative address, so it is necessary to make an arithmetic operation for DODOES and HeRE. In MC6809, does> look like this:

: Does>

Compile (; code) / compile the definition (; code)

0BD C, / JSR expansion opcode

DODOES, / OR: DODOES

Immediate;

You can see how a machine language JSR DODOES is compiled into advanced (; CODE) and advanced behavior.

Direct and indirect string

The only difference in the DTC and STC is that the code domain must modify to point to the new subroutine. This is done by (; CODE), the required changes have been described. Does> There is no impact unless you extends JSR DODOS in STC to become an explicit machine code. In this case, DOES> is modified to assemble "embedded" machine code instead of the JSR DODOES subroutine.

Think before thinking

We may never think that such a few lines of code will lead so many content. This is also why I am particularly appreciated; Code and Does>, I really have never seen such a way to achieve such a complex, powerful and flexible structure. references

[DER82] Derick, Mitch and Baker, Linda, Forth Encyclopedia, Mountain View Press (1982). A word-by-word description of fig- Forth in minute detail. Still available from the Forth Interest Group, PO Box 2154, Oakland CA 94621.

[LAX84] Laxen, H. And Perry, M., F83 for the IBM PC, Version 2.1.0 (1984). Distributed by The Authors, Available from The Forth Interest Group or Genie.

Part IV Compiler or Meta Compiler

The process of writing this article has been wearing a guiding ideology: "keep the shortest". In this principle, I arranged the source list to another place and apologized this. Now, we mainly try to discuss the following topics:

How do you start constructing a Forth system?

You already know now, the main Forth program code is a high-level string, which is usually compiled into a series of addresses. In the FORTH era and early Forth practice, assembly language is the only available programming language tool. The assembly language is very good for writing Forth's Code, but the advanced string must be written in a series of DW pseudo instructions. For example, the Forth word:

: Max (n n - n) over over

Must write

Dw over, over, less, zbran

DW MAX2- $

DW SWAP

MAX2: DW Drop, Semis

Later, since the Forth system that can actually work is getting more and more popular, the FORTH writer began to modify the Forth compiler into a cross-compiler, by running in the Forth system of CP / M (or Apple II, or any other microcomputer system). You can write Forth programs for other CPUs, change the Forth system or write a new Forth system for that CPU.

This compiler is called "Meta Compiler" because it is created from Forth. Computer science has opposed such a title, so some Forth writers still use the term "cross-compilation" and "recompilated" terms, the difference between the two terms is: "recompile" can only be produced for the same CPU New Forth system.

The Forth on most PCs is now generated by Meta, but opinions are generated in the field of embedded systems.

The view of the Forth system using the assembler is considered:

• Meta compiler is immortial, you must fully understand a META compiler and then use it;

• General programmers know the assembler;

• The assembler is always available for a new CPU;

• Compilers handle many optimizations (such as long short adjustment format);

• Compilers process forward reference and special addressing mode, and many Meta compilers can never do it;

• Assessilers can use familiar editor and debugging tools;

• The generation of code is completely visible and does not "hide" anything to the programmer; • Change the Forth model is very easy, and many design considerations affect the internal of the META compiler;

The view of using the Meta compiler is considered:

• You are writing "normal" Forth code, it is of course easy to read and debug;

• One but you understand your Meta compiler, you can easily transplant it to the new CPU;

• The only tool you need is the Forth system on your computer; this is particularly practical for people without PC, as many cross assemblers require PC or workstations.

I have written several Forth systems in a variety of ways, so I have to make a choice is very painful. I tend to use the Meta compiler: I found that Forth's MAX code is easy to read, easy to understand. Opposing many views of the Meta compiler have been overcome by the modern "professional" compiler, if you work with Forth, I strongly recommend that you consider a commercial product. Hey, public Meta compilers (including my own) are still behind the times, clumsy at the same time.

So I am ready to provide basic materials for Forth programmers and tell you your choice. I will give MC6809 code in the form of META, providing the META compiler for F83 (IBM PC CP / M ST). The Z80 code will be written using the CP / M assembler. The 8051 code is written using a public PC cross assembler.

Write the Forth system using the C language?

If you do not discuss this new way to write Forth System with C language, this article will be incomplete. The C language has better portability than the assembler - theoretically, all the work you want is to recompile the same code for any CPU.

The disadvantage of this method is:

• Lack of flexibility in design decision choice;

• After increasing the original language, you must recompile C source code;

• Some C language implemented Forth uses cable technologies with low efficiency, such as multiple CASE statements;

• The code generated by most C compilers is inefficient than the code of assembly language programmers;

But for those UNIX systems and RISC workstations that do not support assembly language programming, this is the only way for Forth is running. The most complete and widely used public domain C language Forth system is TILE. If you don't run a UNIX system, you can take a look at the file hence4th_1.2.a.

In order to continue the previous comparison, let's first take a look at the Hense 4th MAX definition. For the sake of clear, I went to the dictionary:

_max ()

{

Over over Less if Swap Endif Drop

}

Do not use the assembler, write core CODE definitions with C language, for example, this is the SWAP definition of hence4th:

_SWAP ()

{

* (DSP) = * (DSP 1);

* (DSP 1) = i;

}

Please note: Writing the Forth word with C has very different techniques, so these words may differ very much in CForth and Tile.

On the MC68000 or SPARC workstation, such encoding can produce very good code. However, one but you plan to implement Forth, you also need to understand how Forth is working with assembly language. So please continue to read this article. references

[CAS80] Cassady, John J., MetaForth: a metacompiler for Fig- Forth, Forth Interest Group (1980).

[MIS90] Henceforth IN C, Version 1.2, Distributed by The Missing Link, 975 East Ave. Suite 112, Chico, CA 95926, USA (1990). This is a shareware product available from the genie forth roundtable.

[Rod91] Rodriguez, B.j., Letter to the Editor, Forth Dimensions XIII: 3 (SEP / OCT 1991), P.5.

[Rod92] Rodriguez, BJ, "Principles of Metacompilation," Forth Dimensions XIV: 3 (SEP / OCT 1992), XIV: 4 (Nov / Dec 1992), and XIV: 5 (Jan / Feb 1993). Note That the Published Code Is for a Figagent and not f83. The F83 Version is on genie as chromium.zip

[Ser91] Sergeant, Frank, "Metacompilation Made Easy" Forth Dimensions XII: 6 (Mar / Apr 1991).

[TAL80] TALBOT, R.J., FIG-FORTH FOR 6809, FORTH INTEREST GROUP, P.O. Box 2154, Oakland, CA 94621 (1980).

[TIN91] TING, C.H., "How metacompilation Stops The Growth Rate of Forth Program" Forth Dimensions XIII: 1 (May / Jun 1991), P.17.

The fifth part Z80 primitive

I submit the code

Finally, I am going to show a (I hope to be) ANSI compatible compiler: Camel Forth, including its all source code. As a good practice - also for the reasons for copyright - I re-written all the code (you know how difficult it doesn't look at the excellent code instance ?!). Of course, my experience in different Forth systems undoubtedly affects the selected design strategy.

Due to the limited space, the source code is divided into four parts installation (if you can't wait, you can go to Genie to download all files)

• Z80 Forth "primitive": write in assembly language

• 8051 FORTH "primitive": also written in assembly language

• Z80 / 8051 Advanced kernel

• Complete 6809 kernel: Use the source file of the Meta compiler

I plan to use public software to implement Camel Forth: For Z80, use Z80R assembly tools under CP / M; for 8051, use the A51 cross assembler on IBM PC, for MC6809, use my own F83 for CP / M IBM PC Atari ST Tool. "Core" here refers to a series of words that make up a basic Forth system, including compilation and interpretation. For Camel Forth, these are the core words specified by the ANS Forth and add non-ANSI words required to implement these core words. The Forth core is usually composed of two parts: a part is written by the machine code (ie, the word), the other part is the high-level definition word, the word written with the machine code is called "primitive", because in the final analysis, all The Forth system is composed of these words.

Strictly speaking, what should be written with machine code? Select these primitives is an interesting task. A small primitive set can simplify transplantation, but performance is definitely bad. I have heard that only 13 original words can define Forth's situation - of course this is a very slow Forth system. Eforth is a Forth system with portability as a design goal, which has 31 originals.

And my principle is this:

• Basic arithmetic, logical operations, and memory operations for CODE;

• If a Forth word cannot be written with a series of Forth words, it should be implemented with a Code (such as u <, rshift, etc.);

• If a simple word is frequently used, use the Code implementation (such as NIP, TUCK);

• If you do not need to write when a word Code is written, use the CODE implementation;

• If a processor contains the functions required to implement a word, use CODE. For example, in Z80 or 8086, there is a cmove or scan directive;

• If a word is mainly the parameters on the stack, the logic is very simple, and should be implemented with a Code, where the parameters can be placed in the register;

• If the control and logical functions of a word are complicated, it is best to implement it with advanced definitions;

For Z80's CamelForth, I used approximately 70 originals (see Table 1).

When the serial number is entered - when the stack describes the core word: These are the core definition of the ANS Forth documentation 1! X a-addr - put a single number of memory 2 1 / u1 N2 / U2 - N3 / U3 addition N1 N2 3 ! N / U A-Addr - Add a unit to memory 4 - N1 / U1 N2 / U2 - N3 / U3 subtraction N1-N2 5 N1 n2 - FLAG test N1> N2, the number of symbols 8> r x - r: - x Press back Stack 9? DUP X - 0 | XX If the stack Top element is not 0 copy 10 @ a-addr - x reads a unit 11 0 - x r: x - From the return top population 38 r @ - x r : X - X - Read Return Stack 39 SWAP X1 X2 - X2 X1 Exchange Stack Top Two Projects 40 UM * U1 U2 - UD No Symbol 16x16-> 32 Method 41 UM / MOD UD U1 - U2 U3 No Symbol 32 / 16-> 16 division 42 UNLOOP - R: SYS1 SYS2 - Exit Cycle Parameters 43 U x1 x2 - FLAG test of the ANS Forth document 46 <> x1 x2 - FLAG test is not equal 47 BYE I * X - Return to CP / M Operating System 48 CMOVE C-Addr1 C-Addr2 U - From the bottom mobile byte 49 cmove> c-addr1 c-addr2 u - From the top Move byte 50 KEY? - Flag If the key is pressed on the keyboard, return to the true 51 m D1 N - D2 plus no symbol number to Double precision 52 NIP X1 X2 - X2 Removal Stack Top 53 TUCK X1 X2 - X2 X1 X2 See Stack Illustration 54 U>

U1 U2 - FLAG Test U1> U2, no symbol personal extension: These words belong to CamelForth implementation 55 (DO) N1 | U1 N2 | U2 - R: - Y1 Y2 DO Run Time Code 56 (LOOP) R: Y1 Y2 - | Y1 Y2 LOOP Running Time Code 57 ( LOOP) N - R: Y1 Y2 - | Y1 Y2 LOOP Run Time Code 58> 0: S1> S2 72 User N - Defines User Variable 'N' Stack Explanation

R: = Return to the stack

C = 8-bit character

Flag = Boolean (0 or -1)

n = has a symbol 16 bits

u = no sign 16 digits

D = no sign 32 bit

UD = no sign 32

n = no sign 15

X = any unit value

i * x j * x = any unit value

A-addr = aligned address

CA = character address

P-AddR = I / O port address

Y = system designation

After determining the Forth model and the target CPU it use, I developed according to the following procedures:

• Choose an ANSI core word collection as an primitive;

• Follow the ANSI's description, write the compilation definition of these words, join the initialization code of the processor;

• Run the assembler and position the source program error;

• The assembly code generated by the test. My usual practice is to add a few line assembly code, which makes the program to output a character after the initialization is completed, which is a very critical test, which guarantees your hardware, assembler, downloader (EPROM programmer or what else Things), the serial communication is normal! (Only embedded systems) add additional assembly code segments to read the serial port and send it, so you can test two-way communication;

• Write a high-end Forth code snippet to output a character, this code segment uses only Forth primates (usually this: LIT 33H EMIT BYE), which can test the initialization, stack, and strip mechanism of the Forth register. This phase of the problem can be positioned by tracking NEXT, initialization, data stack logic errors, such as setting a stack to the ROM;

• Write a colon definition to output a character, including this definition in the above advanced definition segment, such as definition: blip Lit 34 EMIT EXIT; then test the code snippet LIT 33H Emit Blip Bye. The problem of this phase is usually related to Docolon, Exit, and returns.

• You can now write some tools to assist the development, such as displaying 16-based number on the stack, and more. Listing 1 is a simple test subroutine that runs a never-stop memory DUMP action (this code snippet can be used when the user's input keyboard cannot be operated). This program test primitive DUP, EMIT, EXIT, C @,> <, lit, 1 and branch, also tested several nesting. But it doesn't use Do ... loop, because this structure is going to work properly. When these code can run, you will have some confidence in the Forth module. Then tested other primitives, where do ... loop, um / mod, um * and dodoes must be strict, and finally add advanced definitions.

Read the source code!

If you want to learn more Forth core working principle and its writing method, learn Listing 2. This list follows some of the following Forth document representations:

Word-Name Stack in - Stack Out Description

Word-name is the word that Forth can identify, because these words often contain some special ASCII characters, so you must use an approximate name as the assembly language label of this word, such as the OenPlus is a quotation language label of the word 1 .

Stack in is the parameter on this word hopes to enter, the rightmost is always the top element, and Stack Out is the parameter left on the stack.

If a word affects the stack, a return stack will be given, with R: indication, such as:

Stack in - Stack Out R: Stack in - Stack Out

ANSI FORTH gives a simplified representation of the numerical stack parameters, usually N is a number of symbols that are one unit cell, u is the number of unsigned numbers of one unit, C is a character, and the like, see Table 1.

references

[1] Definition of a Camel: a horse designed by committee.

[2] Ting, C. H., Eforth Implementation Guide, July 1990, Available From Offete Enterprises, 1306 South B Stret, San Mateo, CA 94402 USA.

[3] Z80MR, a Z80 Macro Assembler by Mike Rubenstein, is public-domain, available on the GEnie CP / M Roundtable as file Z80MR-A.LBR Warning:. Do not use the supplied Z1.COM program, use only Z80MR and Load. Z1 HAS a Problem with conditional jumps.

[4] A51, PseudoCorp's freeware Level 1 cross-assembler for the 8051, is available from the Realtime and Control Forth Board, (303) 278-0364, or on the GEnie Forth Roundtable as file A51.ZIP. PseudoCorp's commercial products are advertised Here in tcj.

Z80 CamelForth's source code available in the following sites available ftp://ftp.zetics.com/pub/fort/camel/cam80-12.zip. Section 6 Z80 Advanced Nuclear

correct

There are two errors in the Camel80.Azm file published on TCJ # 67. A major error is that the Forth word> macro definition name header is 2, and it should be 1. Another secondary error is CP / M console I / O. Key must return the characters you play, so the BDoS function 6 is used. KEY? You cannot return characters, use BDOS function 11 to test if there is a key to press. Unfortunately, the BDOS function 6 does not clear the function 11 detection is the button. I have re-written KEY to use BDOS function 6. Because this is a "destructive" test, I must keep the "consumption" button and return it in the next Key call. This new logic can be used for any hardware to provide "destructive" testing.

Advanced definition

In the last discussion, I didn't expand the source code. Each "primitive" performs a small, clearly defined function. It is almost all Z80 assembly code, even if I don't know why the primitive contains a special word, I also hope that the reader understands what the word is doing. In this section, I can't "luxury" like this: I will give the logic of the Forth language. Many books describe the Forth core, if you want to fully master it, please buy a book. I will limit the keywords of the compiler and the interpreter and give the list 2 for TCJ.

Text interpretation operation

A text interpreter or a "outer interpreter" is some Forth code that receives input from the keyboard and performs the required FORTH operation (which is different from the address or "inner layer interpretation", the latter is successfully compiled. ). Understanding the best way to see these code is to see the start of the Forth system.

CP / M entry point (see top part) Detecting the top of the available memory, set the stack pointer (PSP, RSP), and user pointer (UP), establish a memory image shown in Figure 1, and then set the "inner" interpreter Pointer (IP) to perform the Forth word Cold.

Figure 1 Z80 CP / M CAMELFORTH reservoir

Cold initializes user variables by launching tables and executes the word Abort. (Cold also tries to execute the Forth command from the CP / M command line).

Abort Reset Parameters Stack Pointer and Execute Quit.

Quit Reset Back Stack Pointer, Loop Stack Pointer, Interpretation Status, then start executing commands (the name is to distinguish this because quit can be used to exit the application and return to the top of Forth. Unlike Abort, Quit is retained The content of the parameter stack).

Quit is an unlimited loop that enters the keyboard ACCEPT and then calls InterPret as the forth command. When there is no compilation error, Quit prints an OK after each row.

Interpret is almost tylict-by-word translation of the ANS FORTH Documentation 3.4 part of the algorithm. It analyzes an input string separated by spaces, trying to use the FIND to correct the string corresponding to a defined Forth word. If a word is found, word or executed (if this is an immediate immediately, or "interpretation status" in state = 0) or compiles to a dictionary (if in compiling status, state <> 0). If not found, InterPret tries to compile the string into a number. If successful, Litral or put it in the parameter stack (if you interpret the status ") or compile into an online text (if you compile). If this is not a Forth word, it is not a legal number, and an error message is displayed. The interpreter performs Abort, which repeats a string one string until the end of the input line. Forth Dictionary

So how do the interpreter "find" a Forth word in the name? The answer is: Forth maintains a dictionary containing all Forth names. Each name is associated with its executable code in some way.

There are many ways to save the name string for finding: a simple array, a linked list, a multi-link table, a Hash table, etc. Almost all methods can be used - all requirements of Forth are just: If you look up, then the name of the final definition needs to be first found.

We can also have a collection of several names (in new ANSI Forth, this collection is called "vocabulary"). This allows you to use this name without losing a name original meaning. For example, you can have an integer , a floating point , or even a string . This is called "operator overload" in an object-oriented system.

Each string can be associated with its executable code through a neighboring physical memory - such as before the name is in executable code, this is often referred to as the head of the Forth word. The string can also be concentrated in different memory regions, and the executable code is connected to the pointer (this is called "separated head").

You can even have a nameless Forth code snippet, as long as you never need to find them or explain them. ANSI only requires the ANS Forth word to be found.

The design strategy for the dictionary can be written into another paper. Camel Forth is the simplest policy: a simple chain list, positioning before executing code. There is no whites, maybe I can join this capability in the later TCJ paper.

First structure of the word

There is also a problem here that requires discussion: What data is needed in the first part? How to store them?

The least data is the name, priority, (explicitly or implicit) to a pointer to the executable code. For the sake of simplicity, Camel Forth stores the name as a "count string" (length of one byte, behind is n characters). Early Forth Inc. product only stores the length of the name string and the first 3 characters. Figar-forth uses a different tightening method, identifies the last character of the name with the last character of the MSB bit, without the length byte, and other Forth systems also use a tightening string, I think even the Null Null Strings are also available.

"Priority" is a logo for indicating that this word is an immediate word (IMMEDIATE), which is also executed when compiling to implement Forth's compilation indication and control structure. There are also other methods to implement compilation indications, for example, you can put them in a separate dictionary, and so on. Many Forth systems are saved directly in the length byte directly. I use a separate byte that can use a "usual" series operator for the name operation of the string (such as S = and TYPE in Find). If you save your name in a linked list, you need to have a chain. Usually the last word is in front of the linked list, and the chain is forwarded forward. This is in line with ANSI (and most system) requirements for redefining words. Charles Curley studied the placement position of the LINK domain, found that if this domain is placed before the name (not after the name is like Figagen), it can speed up the speed of compilation.

Figure 2 is the first structure of the CamelForth word and compared with the first part of the FORTH FORTH F83 Pygmy Forth system. The "View" field of F83 and Pyhmy can be used as an example, which illustrates how other useful information is saved in the Forth head.

Note: It is very important to distinguish a "head" and "body" (executable code section). They don't need to be stored together. The header is just when compiling and interpreting, a "pure executable" FortH system does not need all the head. However, for a legitimate ANSI Forth system, the head must exist - at least the words in the ANSI Forth word must have the first.

When you "compile" from the assembly code, you can define macros to build this header (see Camelz80.Azm's Head Immed). In the FortH environment, the head and code domain are built by Create.

Compilation operation

We already have enough knowledge of understanding the Forth compiler. Word: Start a new high-level word definition, first create a header (create), change its code domain to "Docolon", and then turn to compile status.

Recall that in the compilation state, each word encountered in the text interpreter is compiled into the dictionary instead of executing immediately. This process continues until the text interpreter encounters the word ";". This word is an immediate word, which is executed, compiles an exit to the end of the definition, then switch to the interpretation status ([).

At the same time, ":" hide this new word, and ";" Shows this new word (by clearing the SMUDGE bit in the first or name), which allows the Forth word to redefine in the "self-priority" manner. In order to force this defined word, you need to use the word Recurse.

We can see that the Forth "Compiler" and C or Pascal compiler have no difference. The Forth compiler contains actions of different forth words, which makes it easy to change or expand the compiler, but if there is no "built-up" compiler, create a Forth application is particularly difficult.

Related word set

There are still many other Forth words, they are:

Realize the needs of the compiler or interpreter, or

Provide convenience of programming

But there is a word set should cause special attention, that is, the words I put into the file Camel80D.AZM.

One goal of the ANSI Forth standard is to hide the CPU and related implementation models to the app (direct or indirect string, 16 bits or 32). In order to achieve this, there is a need to add a few words to the standard. I pushed this request forward and strive to pack these model related issues into the kernel. In the ideal case, advanced FORH code in file Camel80H.AZM should be the same for all Camel Forth targets (although different assembler have different syntax). The difference between the unit size and the alignment of the word requirements are managed by the ANS Forth word Align Aligned Cell Cells Char Chars and my own characters Cell (equivalent to 1 cells, but less compiled).

Words Compile,! CF, CF,! COLON and EXIT hide the characteristics of the serial model, such as how the string is expressed, how the code domain is implemented.

When you study the Z80 direct string and 8051 subroutine string, the value of these words is very obvious:

The implementation of the advanced branch and cyclic operator is hidden in the same style, word, branch,, dest, and! Dest. I tried inventions - do not borrow existing Forth Systems - the minimum set of operators, it can differ the difference in factors. Only time, expert judges and many Camel Forth can explain how much success in this area.

So far, I did not successfully put different factors in the head structure to a similar word set. Find and create are closely linked to the header, I have not found the appropriate subferia. I have started this effort, through the word NFA> LFA NFA> CFA IMMED? Hide Reveal and Ans Forthword> Body Immediate., I will continue this work. It is worth gratifying that it is now possible to use the same head structure for all Camel Forth implementation (because they are 16-bit FortH systems addressing bytes)

Next I will give an 8051 kernel and describe how the Forth compiler and interpreter are used in Harvard architecture (such a computer structure computer to log some of the memory into two parts, such as 8051). For 8051, I will give the file Camel51 and Camel51D, but there is no Camel51h, because in addition to the assembly language format, the advanced code will not have any differences I discussed here, while the editing of this journal needs to publish additional articles. Fortunately, all the code can be downloaded.

LINK - In CamelForth and Figar-forth, point to the length byte of the forward word. In the Pygmy Forth and F83, in the middle, it points to the first word LINK.

P - priority, if it is 1, it is not used in Pygmy.

S - Smudge bit, block find from finding this word

1 - In Figagen and F83, the highest significant bit (bit 7) of the length byte and the name and the last character (bit 7) is identified

View - In Pygmy Forth and F83, it is the number of the source block of this word.

references

1. Derick, Mitch and Baker, Linda, Forth Encyclopedia, Mountain View Press, Route 2 Box 429, La Honda, CA 94020 USA (1982). Word-by-Word Description of Fig-Forth.

2. Ting, C. H., Systems Guide to Figagent, OFFETE Enterprises, 1306 South B Street, San Mateo, CA 94402 USA (1981) .3. Ting, C. H., Inside F83, Offete Enterprises (1986).

4. Ewing, Martin S., The Caltech Forth Manual, A Technical Report of the Owens Valley Radio Observatory (1978). THIS PDP-11 Forth Stored A Length, Four Characters, And A Link In Two 16-Bit Words.

5. Sergeant, Frank, Pygmy Forth for the IBM PC, Version 1.4 (1992). Distributed by The Author, Available from The Forth Interest Group (P. Box 2154, Oakland CA 94621 USA) OR on genie.

6. J. E. Thomas examined this issue thoroughly when converting Pygmy Forth to an ANSI Forth. No matter what tricks you play with relinking words, strict ANSI compliance is violated. A regrettable decision on the part of the ANS Forth team.

7. in Private Communication.

The Source Code for Z80 Camelforth Is Now Available On Genie As Camel80.arc in Genie As Camel80.arc in the cp / m and forth ruggs. Really. I Just Uploaded it.

The source code of Z80 CamelForth can get ftp://ftp.zeteletics.com/pub/forth/camel/cam80-12.zip from the following sites.

Seventh part 8051 Camel Forth

Under our respected editing requirements, I gave 8051 Camel Forth, while Forth for MC6809 will also be completed. This 8051 Forth takes up 6K bytes of program memory. However, all source code will account for the 16 pages of TCJ, so this article only gives the main changes in the core transplantation process. We will explain how the advanced code is modified by the 8051 assembler format requirements and subroutine string technology.

Z80 correction

In the file Camel80H.AZM, the definition of DO is given

['] xdo, branch..

it should be

['] XDO, XT..

This is because there is no Consequence on Z80 (there, BRANCH, and XT is equivalent), but on 8051, it is obvious.

8051 Camel Forth Model

In the # 60 paper, I summarized the design method of 8051 Forth. Again explanation: 8051 Reflecting a slow memory addressing actually requires the subscriber string. This means that the hardware stack (in the 8051 register file) is returning. The parameter stack (that is, the data stack) is in the 256-byte external RAM, using R0 as the pointer to this stack. From that article, I found that putting the top elements (TOS) in DPTR is better than in R3: R2. Then there is such a programmer model which also contains the idea of Charles Curley [Cur93]. On the machine like 8051, we can put the inner layer loop index in the register so that loop and loops are faster. Do I must press the return stack into two values: the old loop index and the new loop final value. Unloop This is of course required to get a loop index from the return stack - ANSI puts unllop as a separate word. Note that R6: R7 is not a stack top element returns to the stack, it is just an index of the inner layer loop.

P2 contains a high byte of the parameter stack pointer (allowing R0 addressing external memory), which is also the low-byte assumption of the user pointer to the UP of 00. I spent a lot of Jin Ming understood that P2 cannot be read when executed from the external ROM, so I saved a copy of the P2 in register 8.

I have a very good implementation method for BRANCH and BRANCH. Since the 8051 model is a subroutine string, advanced Forth is compiled as a true machine code, BRANCH can be implemented with an SJMP (or AJMP or LJMP) instruction. • BRANCH can be implemented with a JZ instruction, as long as the TOS's zero / non-zero flag has been placed in the accumulator (A register). Do this work with a subroutine zerosense, so branch and? Branch becomes:

Branch:

SJMP DEST

? Branch:

LCALL ZEROSENSE JZ DEST

Similarly, loopsense and plusloopsense allow JZ instructions to use Loop and Loop. For these cases, UNLOOP should be called to clear the return stack when the program "falls out" is loop.

In many places in the assembly language source file, I handle LCall Word Ret with shorter and faster words LJMP Word, as long as "word" is not a returning stack operator (such as R> or> R). As long as it is possible, words LCall and LJMP are replaced with ACALL and AJMP.

I wrote an 8051 core (low byte first) in the intel byte, and I found that the address compiled into LJMP and LCALL is the high byte. In order to avoid rewriting the entire kernel, I contain one byte exchange word for these words: Compile,! CF and CF (they are all defendency words).

Harvard architecture

8051 Use Harvard architecture, programs and data to be stored in separate memories. In embedded systems, they are ROM and RAM, respectively. ANS Forth is the first standard that can accommodate Harvard architecture restrictions. Simply put, ANS FORTH regulates:

• The application can only access the data store, while

• All operators that access the memory and constructive data structures must be operated in the data space.

(See the ANS Document Section 3.3.3 [ANS94].) Includes the following five words: @! C @ c! Dp here allot, c, count type word (s ") s" cmove, however, the Forth compiler also requires access to program space (Also known as code or instruction space). Forth needs to maintain a dictionary pointer for program space and data space. So I added the following new words:

I @ i! IC @ IC! Idp iHere Iallot I, IC, ICOUNT ITYPE IWORD (IS ") IS" D-> i-> D

Here the prefix "i" represents the instruction (because P and C have other meaning in Forth). ICOUNT and ITYPE are used to display strings that have been compiled into the ROM. IWord Copy the word Word left from the data space to code space - used to construct the Forth stroke and string in the ROM. D-> i and i-> D are equivalent to cmove, which is copied from / to code space.

Variable must be positioned to the data space. So they cannot use traditional methods to keep data immediately in code domain. The method here is that the address of the data in the data space is stored behind the code domain. Basically, a Variable is a constant, its value is the address of the data space. (Of course, traditional constant is still effective)

Create words, and words created with Create ... Does> must work in the same way. The following is what they look like:

Code Word: ... HEADER ... 8051 Machine Code

High-level: ... HEADER ... 8051 MACHINE CODE

Constant: ... HEADER ... LCALL-DOCON VALUE

Variable: ... HEADER ... LCALL-DOCON DATA-ADRS

Created: ... HEADER ... LCALL-DOCON DATA-ADRS

Note that constant must replace the value deposited in CREATE: "Un-Allot" must be "un-allot" all of these values and LCALL DOCON.

S "There is a special problem. Use s" defined strings ("text constant") must reside in data space, where they can be used by Type and Evaluate. But we hope that these words are part of the definition and can reside in the ROM in the ROM Forth environment. We can store strings in program space, replicate to Here, but the ANS document does not allow text constants to exist in this "temporary" storage area (see ANS documentation 3.3.3.4 and 3.3.3.6 [ANS93]). At the same time, if Word returns its string address in Here - just like Camel Forth - The text constant cannot change this temporary area.

My solution is S "storage string to code space, but also in the data space for it to reserve position, when referenced, copy it from the code space to this data space. ANS FORTH does not solve Harvard architecture processor All issues, sometimes "initialization data area" like C may also need it.

Because. "Never be used by programmers, they can be stored in the code space, the method is to use the word (IS") and IS. (Which is "old" (s ") and s".). Although the kernel Increased two words, but saved many data space. I plan to concentrate on string constants to the Dependency word collection, or create a new "hardvard" collection. Write program space

8051 does not really write the program memory, no hardware signal, and there is no hardware instruction. In this environment, the CamelForth interpreter works, but does not compile new words. We can try to make some memories simultaneously appear in programs and data spaces. Many 8031 Applications give a method of accessing data and program space simultaneously, implementing some signals on hardware. Figure 1 shows the MCB8031 of my revision of the board, this board is the MCB8031 of Blue Ridge Micros (2505 Plymouth Road, Johnson City, TN, 37601, USA, TELEPHONE 615-335-6696, FAX 615-929-3164). U1A and U1b generate a new strobe signal, as long as the program or data is read, EPROM can be selected (low 32K) when A15 is low, and the RAM is high when A15 is high (high 32K). Of course, you can't write EPROM, but you can execute programs from RAM! There is a disadvantage: this makes @ and i @ 等,, if you use the wrong you, you can't discover it immediately.

Figure 1 Modified 8051 circuit diagram

The purpose of these advanced definition word modifications is to implement Camel Forth transplant between Harvard architecture and von Norman architecture machine. For the latter, the new program spatial word can simply correspond to the data spatial word, such as Z80

IFETCH EQU FETCH

IStore Equ Store

ITYPE EQU TYPE

and many more

In the next article, I will modify the 8051 source code so that it can work on 6809, which is a truly portable model that is incorporated by continuous improvement.

references

[ANS93] dpANS-6 draft proposed American National Standard for Information Systems - Programming Languages - Forth, June 30, 1993. "It is distributed solely for the purpose of review and comment and should not be used as a design document It is inappropriate. To Claim Compatibility. "Nevertheless, for the laast 16 months it's all we've had to go by.

[Cur93] Curley, Charles, Optimization Considances, Forth Dimensions XIV: 5 (Jan / Feb 1993), PP. 6-12.

8051 CamelForth's source code can get ftp://ftp.zeteletics.com/pub/forth/camel/cam51-15.zip from the following sites

Eighth part MC6809 Camel Forth

Now, we will give the last part of this article, which is the long-lasting Motorola 6809 ANSI Camel Forth. This implementation is designed specifically for the Scroungmaster II processor board. Unlike the CamelForth of Z80 and 8051, MC6809 Forth is generated with my "Chromium 2 Forth Mata Compiler". You can see two things:

• First, the MATA compiler is running (F83) on an old Forth system, so the source code contains 16 x 64 Forth "Screen". I tried to turn it into an ASCII file, but the original traces are still very obvious;

• Second, the source code for the META compiler looks like a general Forth code (I will discuss it immediately, there are some small changes), so that the definition of 1 becomes:

Code 1 1 # Addd, Next; C

The assembler is used is the MC6809 assembler I have previously discussed.

I am directly in the premium source code (converted to the Forth Syntax). Unfortunately, since this is a long time, and I sometimes refer to the Z80 list, sometimes refer to the 8051 list ... The result is that the Harvard architecture constructs the word (such as i @ ialloc) is not persisting in MC6809. This is not important for the MC6809 of non-Harvard structures, but if you want to use the Forth code for the Harvard structure, I have to modify these errors.

In addition, since I work on the list of published lists, I often forget to write a detailed description of the high-level word definition, however, you can know how they work from the original list, of course, I don't Forced you to do this.

Source code description of MC6809 Camel Forth

The MC6809 Camel Forth Model places TOS into the D register and uses the S stack pointer for the parameter stack, and the U pointer is used to return the stack, Y is the explanation pointer. X is a temporary register of the W register. MC6809 direct page pointer DPR Save the high byte of the user pointer (low byte assumption is 0).

8K RAM and 8K EPROM on the Scroungemaster II board Press the following address image:

6000-797FH RAM Dictionary (for new definition)

7980-79ffH terminal input buffer

7A00-7A7FH User District (User Variable)

7A80-7AffH Parameter Stack (downward growth)

7B00-7B27H HOLD zone (downward growth)

7B28-7B7FH PAD area (universal buffer)

7B80-7BFFH return stack (Grows Downward)

E000-FFFFH EPROM FORTH core

All RAM data areas are referenced by user pointers, and its start address is up-init: in us is 7A00H (Note the high byte of this word and UP-INIT-HI). When Camel Forth starts, it sets the dictionary pointer to DP-init, and must be in RAM so you can add a new definition to the Forth Dictionary. These are specified by the EQU instructions of the META compiler. These EQU instructions do not occupy the core space of the MC6809, and they do not appear in the Forth Dictionary of the MC6809.

Dictionary tells the MATA compiler to compile the code, in our case, 8K EPROM, the new dictionary is named ROM, and then the ROM is assigned to the selected dictionary. (If you are familiar with the vocabulary of Forth, you will see a strong similarity). Word AKA defines synonyms of a Forth word. Because the MC6809 is not a Harvard architecture computer, we should compile all I @ compiled into @, other "with i prefix" (instruction space) words, also do the same processing. AKA will complete this work. Like these synonyms, they do not appear in the Dictionary of the MC6809.

The MATA compiler allows you to use the forward reference, which is to access the Forth word that has not been defined (you certainly need to define them before all completed!). This is usually automatic, but AKA requires you to use PRESUME to expressly explain, such as:

PRESUME WORD AKA WORD IWORD

Synonyms used to create iWord. @! Here Allot is automatically defined by the Meta compiler, and we don't need to use PRESUME for these words.

Code definition is very convenient. Note that you can use:

Here Equ labelname

A label is generated in META compilation (this is a function of the Meta compiler rather than the component of the assembler). In addition, ASM: Start a assembly code snippet (that is, this is not part of a Code word).

Below

Here Resolves Name

Used to address specific forward references used by the META compiler (for example, where the Meata compiler needs to know where the Docolon action is. You should make these independence. In addition, you can freely add Code definitions in the source code.

The code defining the word and control structure (IMMEDIATE IT) is more difficult to understand, where these words must perform some actions during META compilation. For example: The MC6809 Forth contains standard word cosntant to define a new constant. But many cosntant definitions also appear in the MC6809 kernel. We may need to define a new Constant in Meta compilation. Emulate: Phrase how to indicate how different Constant conflicts move. This phrase is completely written with the Meata compiler word, so it looks completely unclear.

Similarly, if the word, the word, and other similar words contains META compile phrases for constructing and resolving the differences between MC6809 images. Some Meta compilers hide these words in the compiler, which can generate beautiful target code, however, if you need to change the branch, you must modify the Meta compiler.

I tend to make these actions easy to modify, so I choose Chromium to put it in the target source code. (The most horrible example is the definition of TENDLOOP and TS ", which is actually expanding the glossary of the Meta compiler in the target source code.

If you are a new hand of the Forth and Meta compilers, the best way is to accept this. "Ordinary" colon definition is easy to join, just refer to the MC6809 other source code. You can even write create ... does> word, as long as you are not using them in the Meta compiler.

On a 1MHz MC6809, a line of text input requires significant long time dealing (rough estimate of about 1 second). The part of this is due to the many parts of the interpreter that use advanced forth encoding, and another reason is that Camel Forth uses a single-link table structure. These only affect the speed of compilation without affecting the execution speed. However, the delay is always annoying, maybe one day I will write a papers about "accelerated forth".

Now, the user's pointer UP will not change. We have a UP purpose to support multitasking - each task has its own user area, stack, and more. I will quickly work on this issue. I may study the memory management of SM II and provide 32K private dictionaries for each task. Of course, I will try to write a real multiprocessor Forth kernel using the shared bus. If I live longer enough, I should also write a distributed Forth core using serial port.

The source code of the MC6809 version 1.0 in Genie's Forth Roundtable, the file name is CAM09-10.Zip, this file contains the Chromium 2 Meta compiler, which is available. As long as there is F83, you can enter:

F83 chromium.scr

1 loading

BYE

This will be loaded into the META compiler, compile the MC6809 Camel Forth, and write the result to the Intel format 6809.Hex. Note: If you use the CP / M or Atari St version of F83, you must edit the Load screen to remove the HEX file utility because this program is only written for MS-DOS. I didn't test CP / M or ATARI ST used by Chromium 2. If you need help, please contact me.

references

[Rod91] Rodriguez, B. J., "B.O. Assembler," The Computer Journal # 52 (SEP / OCT 1991) AND # 54 (JAN / Feb 1992).

[Rod92] Rodriguez, BJ, "Principles of Metacompilation," Forth Dimensions XIV: 3 (SEP / OCT 1992), XIV: 4 (Nov / Dec 1992), and XIV: 5 (Jan / Feb 1993). Describes the "Chromium 1 "Metacompiler.

The source code of MC6809 CamelForth can get ftp://ftp.zeteletics.com/pub/ fort /camel/cam09-10.zip from the following sites.

转载请注明原文地址:https://www.9cbs.com/read-49772.html

9cbs

New Post(0)