Contents | ELIZA | Appendix: The Lisp Editor
On its most fundamental level, a Lisp interpreter consists of little more than a set of routines to handle some fancy pointer manipulation (a pointer being an address of some location in memory). This simplicity is a direct result of the uniformity of Lisp data structures. As described below, the primary data structure, the list, maps quite easily onto an equivalent internal representation. The interpreter's simplicity is further enhanced by the non-necessity of a sophisticated parser, due to Lisp's rather simple syntax. Moreover, because Lisp is by definition a recursive language, the interpreter may be defined recursively as well, substantially reducing its complexity.
The P-Lisp workspace is divided into four-byte units called "cells." The first two bytes of a cell are called (naturally enough) the CAR of the cell, and the last two bytes are the cell's CDR. The cells are aligned on contiguous four-byte boundaries, so that the last two bits of a cell's address are always zero (see below). This is necessary because the last two bits may then be used as status flags or to describe a cell's contents.
Cell 0 Cell 1 Cell 2 Cell 3 -------------------------------------------------------- || | || | || | || | || || CAR | CDR || CAR | CDR || CAR | CDR || CAR | CDR ||... || | || | || | || | || -------------------------------------------------------- Byte 0 1 2 3 4 5 6 7 8 9 A B C D E F 10Figure 27.1. Memory Cells
For example, the CAR or CDR of a cell will typically contain the address of another cell. Bit 1 of such a pointer is used to indicate the type of data it is pointing at. If the bit is set, the pointer (with the bit reset) points to an atom; if the bit is reset, the pointer points to a list. A list is represented as a linked list of cells. For example, the list (A B C) is represented simply as
--------- --------- --------- | | | | | | | | | | | -----> | | -----> | |NIL| | | | | | | | | | | | | --|------ --|------ --|------ | | | v v v Pointer Pointer Pointer to A to B to CFigure 27.2. List Representation
The CDR of the last box is a pointer to the atom NIL. To make life easy for the interpreter, NIL is predefined to live at a location in memory where the hi-byte of its address is guaranteed to be zero (such an address is said to reside in "page zero" of memory). So, if the first byte of a pointer is zero, the interpreter assumes that the pointer points to NIL, regardless of the value of the second byte.
Note in Figure 27.2 how the list (A B C) maps directly onto its internal representation. The CAR of (A B C) is A, and the CAR of the first cell of the list points to the atom A. The CDR of (A B C) is (B C), and the CDR of the first cell points to the internal representation of (B C). You should be able to see already that to evaluate (CAR '(A B C)), for example, the CAR subroutine simply returns the CAR of the first cell of (A B C).
P-Lisp supports three types of atoms: literal atoms, integer atoms, and floating-point atoms. Literal atoms are stored with the following format:
--------- --------- | | | | | | | | -----> | | -----> Pointer to | | | | | | | | property list --|------ --|------ (could be NIL) | | v v Pointer to Pointer to value print-name listFigure 27.3. Literal Atom
As an example, the atom APPLE with the value FRUIT and the property list (COLOR RED) would have the following format:
--------- --------- --------- --------- | | | | | | | | | | | | | | -----> | | -----> | | -----> | |NIL| | | | | | | | | | | | | | | | | --|------ --|------ --|------ --|------ | | | | v | v v | Pointer to | Pointer to Pointer to FRUIT | COLOR RED v --------- --------- --------- | | | | | | | | | |A P| -----> |P L| -----> |E |NIL| | | | | | | | | | --------- --------- ---------FIgure 27.4. Literal Atom
--------- | | | | |NIL| | | | | --|------ --------- | | | | ----------> | |NIL| | | | --------- ^ | | --- 16-bit valueFigure 27.5. Integer Atom
--------- | | | | |NIL| | | | | --|------ --------- --------- | | : | | | : | -----------> | : | -----> | : | | : | | | : | --------- --------- ^ ^ | | | | Exponent & Mantissa & mantissa sign byteFigure 27.6. Floating-point Atom
All literal atoms, including the built-in atoms (T, NIL, and the built-in functions), are organized into a single list within the workspace, called the OBLIST. Literal atoms are always unique; for some atom A, there can be only one instance of A in the workspace. All references to this atom are pointers to the single location where A resides. Numeric atoms, however, are not unique; there can be any number of instances of a given numeric value.
When the Lisp READ routine reads a s-expr, the s-expr is parsed and an internal representation of the s-expr is built. For every literal atom, the interpreter scans the OBLIST for that atom, comparing the atom's print name to those of the atom on the OBLIST. If the atom is found, the pointer to the atom (with the atom bit set) is returned; otherwise, an internal representation of the new atom is built and added to the end of the OBLIST, and a pointer to the new atom is returned. If READ reads a numeric atom, it simply builds a new atom (integerized, if possible) and returns its pointer. Although unique numeric atoms would provide a substantial memory savings, vital for any microcomputer implementation, the result would be a severe degradation of performance in any application involving extensive calculations (consider that, if numeric atoms were unique, the OBLIST would have to be scanned after every calculation to determine if the result already lives on the OBLIST).
After READ builds the linked list of cells representing the just-read s-expr, a pointer to the list is passed on to the evaluator, EVAL. EVAL determines if the s-expr is an atom or a list by the status of the atom bit. If the bit is set, the s-expr is an atom, and EVAL evaluates it in the following way: If the atom is numeric (indicated by the CDR of the first cell being NIL), EVAL just returns the pointer to the atom as the result. In the case of a literal atom, EVAL first scans the environment chain (see discussion on environment chain below) for the atom, and returns the corresponding value if found. If the atom is not on the chain, EVAL uses the atom's OBLIST entry, returning the CAR of the first cell as its value.
If EVAL is handed a list to evaluate, the CAR of the first cell is used to determine the function to be applied. If the pointer points to an atom, EVAL scans the property list of the atom for the EXPR property. If found, the corresponding property value, the function definition, is evaluated, with the CDR of the first cell handed to EVAL used as the pointer to the argument list. If EVAL does not find the EXPR property, the property list is scanned again, this time searching for the SUBR property. If this property is found, the corresponding property value is used as the address of the interpreter subroutine that evaluates this SUBR. This routine is called, passing to it the pointer to the argument list. If the CAR of the first cell handed to EVAL points to a list, EVAL calls itself to evaluate this list, applying the end result of this evaluation (which must be an EXPR or SUBR) to the argument list.
The SUBR routines themselves consist primarily of building new cells, deleting old cells, and moving pointers around. Consider, for example, the function CONS. Recall that (CONS 'A '(B)) creates a new list (A B), the CAR of which is A, and the CDR of which is (B). This is accomplished in the interpreter by simply getting a new cell and storing the pointer to A in the CAR and the pointer to (B) in the CDR. The pointer to the new cell now points to the result, the list (A B).
The environment chain contains the active LAMBDA-bindings during a given evaluation. LAMBDA-bindings are stored as cell pairs; the CAR of the first cell points to the formal argument, and the CAR of the second cell points to the value bound to the formal argument. The end of the environment is marked by a NIL formal argument, with the CDR of that cell pointing to the next environment on the chain. Below is a sample environment chain.
--------- --------- --------- --------- --------- | | | | | | | | | | | | | | | | | -----> | | -----> | | -----> | | -----> |NIL| | | | | | | | | | | | | | | | | | | | | | --|------ --|------ --|------ --|------ ------|-- | | | | | v v v v v Formal X Value bound Formal Y Value bound Pointer to X to Y to next environmentFigure 27.7. Environment Chain
During the course of an evaluation, cells are constantly being used and discarded as required by the interpreter, for example, by building and discarding environments. Initially, all free cells in a workspace are organized into a linked list called the free-space list. New cells are taken from this list as needed by the interpreter. When the interpreter runs out of new cells, a routine called the Garbage Collector is invoked. The Garbage Collector scans the entire workspace, collecting all discarded cells into a new free-space list. This collection is accomplished in two phases, the Mark phase and the Sweep phase. During the Mark phase, all cells that are currently "active," that is, those cells that are attached to the OBLIST or those that are part of the environment or the recursion stack, are marked as active. This is done by setting the garbage collector bit, bit 0, of the CDR of each cell. Cells that are not marked at the end of the Mark phase cannot be reached via a pointer path either through the OBLIST, the environment chain, or the recursion stack; these cells are considered free and reusable. During the Sweep phase of garbage collection, the free cells are collected into a new free-spacelist. This is accomplished by scanning the workspace from top to bottom. All cells that are marked are simply unmarked, while cells that are initially unmarked are added to the freespace list. After this phase is completed, the interpreter continues processing where it left off when the Garbage Collector was invoked.
Contents | ELIZA | Appendix: The Lisp Editor