Home | Up | Questions? |
The design and implementation work for a full-scale PASCAL compiler on Nova 3/D minicomputer began in January 1978. This report summarizes the efforts as of April 28, 1978. Data structure and implementation notes are discussed in details while the future design and implementation work are roughly sketched. The language implemented is formally presented in section Three. The differences between this implementation and the standard PASCAL are outlined and justified.
A one-pass compiler in general requires significant amount of memory to hold the compiler code and the relevant data structure such as string table and symbol table in core. For most mini- and microcomputer system, this is not practical and may not even be feasible. A one-pass compiler also tends to produce rather poor code which is not acceptable to most mini- and microcomputer applications. For instances, real-time control applications and system programming language. A minimum of two-pass compiler is proposed and designed which has the following advantages:
Note that the first pass of the compiler does not even include memory allocation which makes itself completely target machine independent. The symbol table will be passed to the second pass in a condensed form. The compiler could be constructed to produce code on a procedure by procedure basis. However, this requires some kind of self-overlaying structure which might not be feasible. Therefore, the variables and labels for all procedures are mapped and passed to the second pass.
The second pass will generate assembly language code which makes the code generation task easier to handle. If MACRO assembly language is supported by the target machine, it should even further reduce the amount of efforts. Some kind of machine independent optimizations is expected in the second pass.
letter letter digit
digit
+ constant identifier - unsigned integer ' character '
unsigned integer NIL constant identifier ' character '
variable identifier , field identifier [ expression ] . field identifier
type identifier ( identifier ) , constant .. constant
type identifier simple type packed file OF simple type RECORD field list END ARRAY [ simple type ] OF type ,
; , identifier : type CASE identifier : type identifier OF , ; constant : ( field list )
VAR , ( identifier : type ) PROCEDURE identifier formal parm list FUNCTION identifier formal parm list : type ;
, ( expression ) procedure identifier function identifier
variable unsigned constant function identifier actual parm list ( expression ) NOT factor
factor * / DIV MOD AND
+ term - + - OR
simple expr > < >= <= <> = simple expr
variable := expression function identifier
IF expression THEN stmt ELSE stmt
CASE expression OF constant : stmt END , ;
FOR variable identifier := expression DOWNTO TO expression DO stmt
WHILE expression DO stmt
REPEAT stmt UNTIL expression ;
WITH record variable DO stmt ,
BEGIN stmt END ;
procedure identifier actual parm list
GOTO unsigned integer
unsigned integer : assign stmt if stmt case stmt for stmt while stmt repeat stmt with stmt compound stmt call stmt goto stmt
LABEL unsigned integer ; ,
CONST identifier = expression ;
TYPE identifier = type ;
VAR identifier : type ; ,
PROCEDURE identifier formal parm list FUNCTION identifier formal parm list type : EXTERNAL ; block ; FORWARD
label part constant part type part variable part procedure part stmt
block .
The differences between this implementation and standard PASCAL is listed and discussed briefly :
For convenience, the reserve words are listed as shown below :
AND ARRAY BEGIN CASE CONST DIV DO DOWNTO ELSE END FILE FOR FORWARD FUNCTION GOTO IF LABEL MOD NIL NOT OF OR PACKED PROCEDURE RECORD REPEAT THEN TO TYPE UNTIL VAR WHILE WITH EXTERNAL
Also, the standard identifiers are listed as shown below :
The major data structure for the scanner is the string table(STRTABLE). There is no practical limit on the length of an identifier. Complete character string will be stored in the string table and the length and a pointer will be returned to the paser.
The scanner returns four global parameters each time being called:
Several other variables such as line number(LINENUMBER) and column pointers(LINEPTR and LASTLINEPTR) are maintained for listing purposes. Reserve words are organized by hashing technique. Collisions are resolved by chaining them together and ended by a pointer which points to the last element itself. Several issues in the scanning phase are treated in the following ways:
Recursive descent parsing algorithm is employed in this compiler. The 'IF..THEN IF..THEN..ELSE...' ambiguity is resolved in such a way that the ELSE statement is always associated with the closest IF.
In a few cases, continuation of the parsing algorithm requires some kind of semantic information:
Symbol table is maintained in a stack(SYMTABLE). Therefore, symbol table entries can be reused due to the block structure nature of PASCAL language. Global(level 1) and intrinsic(level 0) declarations are the major memory-consuming factors. This concept also applies to the string table. However, a static display(DISPLAY) is necessary to maintain information for the nested blocks. Each symbol table entry will be associated with a unique number and will be mapped out along with the intermediate code segments. This mapping facilitates the symbol table accessing in the second pass since there is no search required.
Each symbol table entry has four common parts:
Hashing technique is used for symbol table search. An array of bucket pointers is maintained for the symbol table search purpose. Not all the symbol table entries are linked into the search structure. Field identifiers are not linked into the search structure. However, separate display information for WITH statement processing and RECORD type symbol table construction is necessary. Unnamed symbol table entry will not be linked into the search structure either. For example :
BINX : RECORD PARTA, PARTB : INTEGER; END;The variable identifier BINX will point to a record entry which does not have a name.
Type declaration parts allow forward reference except for the type of tag identifier which is necessary for VARIANT processing. For examples :
LINK = PTR; LINKX = PTR; NEW = OLD; PTR = RECORD X,Y : OLD END; OLD = 0..100;
Label value is stored in the symbol table entry and the entry is linked for search purpose. The hashing value for the unnamed label is the value module bucket size. These external labels will be combined with the internal program control labels and mapped out along with intermediate code segments.
Multi-dimensional array is conceptually broken down into nested one-dimensional arrays for simplicity. For example:
DATA = ARRAY [1..10, RED..BLUE] of BOOLEAN;is the same as
DATA = ARRAY [1..10] of ARRAY [RED..BLUE] of BOOLEAN;
Procedure and function declaration is conceptually treated as a procedure/function identifier pointing to a TYPE which contains the parameter information. In other words, procedure and function are types. The scope of procedure/function declaration is better illustrated by the following example :
... (* level n *) P1(3, TRUE, FUNCID); ... Procedure P1(X1 : integer; Y1 : BOOLEAN; FUNCTION Z1(A1 : INTEGER; A1 : BOOLEAN) : INTEGER); (* level n+1 *) BEGIN IF Y1 THEN ... ELSE IF Z1(10,TRUE) <> 0 THEN ... END; ... (* level n *) ...
The following aspects need particular attention :
The second phase of semantic processing will involve at least the following tasks :
There is no easy and clean way to handle errors. However, for practical reasons, this is so essential a feature for the users to get as much information as possible in each compilation process.
Warning messages and Error messages are implemented at this stage. In general, only warning message will be given as long as the compilation process could go on. Error handling routine is treated in its most primitive way. That is, proper flag will be set and error number will be given. No recovery is attempted but proper termination is achieved.
There are two useful approaches to deal with errors in the future :
There are two more relevant issues:
The experiences gained from this project are rewarding. Continuing efforts along this direction are expected.
My special thanks go to Dr. James L. Peterson for his helpful discussion and great patience on many unexpected interruptions. Without such support this project could not proceed.
Home | Comments? |