AUSAM/source/mac/macdoc/sys







	M A C   S Y S T E M    D E S C R I P T I O N.
	- - -   - - - - - -    - - - - - - - - - - - 


		Author  :  Ross Nealon

		Date    :  1/11/78

		Site    :  University of Wollongong.



	This document comprises the system description of the
	MAC cross-assembler as it is running at the University
	of Wollongong on an Interdata 7/32 computer.

	This document forms the third and last part of the
	description of the MAC cross-assembler.  The first
	part describes MAC itself, the second describes the
	table formatter, and this part describes the system
	from the implementor's point of view.


FILE  mac.h
----  -----

   This file contains standard definitions of structures and
special constants.  The only really tuneable constants are
described under the heading 'misc. descriptors.'  It would be
safer to leave the others alone.

   Struct st is the definition of one symbol table entry.
It contains the symbol's name (8 chars), its value, mode flags
(for relocatable/absolute value, global, etc), and a link pointer.
The symbol table is organised as a linked list structure, with
the appropriate list being chosen by a hashing algorithm.  The
hashing algorithm is at present the first character of the
symbol.  Access is by a monitor routine 'lscan'.

   Struct it forms the heart of the linkage between the first
pass and the second pass of MAC.  These 'records' exist, one
per line of source code, and are essentially a coded form of
the source.  These 'intercode' records are generated to allow
fast processing in the second pass.

   The flags field describes the nature of the intercode record.
It can be a null line, a line with a pseudo opcode, or with a
machine defined opcode.  The label field indicates the presence
(-1 implies absence) of a label tag on the source line.  Its
value is the index into the symbol table for that label.
Op indicates which opcode/pseudo-opcode exists on the line.
Loc indicates the current location counter, Selc the selection
actions to take when assembling code, and Opr the arguments to
the instruction.

   Struct lt is the table of location counters.  They have a
current value, start address for the code segment, end address
for the segment, and two pointers.  Next points to the next
free address in memory to put assembled code (during pass 2).
Rel_f is the address of the start of the segment in memory
during pass 2 assembly.  This is so an 'org' instruction
can set the next pointer correctly.

   Struct bt is a standard UNIX buffer declaration.

   Struct ht describes the header record at the start of each r-file.
This record describes the number of literal labels, the number of
pre-defined labels, format descriptors, opcodes, the actual format
descriptors, program counter pre/post increment flag (for relative
addressing), lengths of a byte in bits, word in bytes, opcode table
length, parser table length, parser start address, page length (for
listings), an illegal instruction value, and a string to write at
the top of each page of output generated.

   Struct od describes the opcodes.  It has it's name, format,
and a list of values it can have.  If the selected ii value is
assembled, MAC will report that an illegal form of that instruction
is being assembled.

   Struct fd describes the format descriptors - by length of 
instruction, details of instruction, and number of args to the instruction.

   Struct tbl is the parser table entry.  Sym is one of <n> legal symbols
that can appear next in the parse of the source line.  Mem further
describes the symbol.  Next is the next parser table entry to use
if the parser Sym matches the input Sym.  Act is an action to perform
on a match, and Arg is it's argument (if any).


FILE mac00.c
---- -------

main():
   The main program decodes the argument list, sets options into
the variable msflag, and stacks each non-option (as filenames).
Files are opened, tables are read in, pass 1 is started.  If any
errors exist, assembly is halted.  Location counters and files
are adjusted, and pass 2 started.  Code output routines are then
called and MAC exits.

tbl():
   Reads from the named fd a set of binary tables describing
the target machine.

adjust():
   Counts up memory needed for assembly, sbrk's for the core,
and sets up the pointers appropriatly.
   All location counter values are checked against their
current limits, and the maximum taken as the size of the
code segment to generate.  The size of the segment (end
address minus start address) is added into a running sum,
and the location counter reset to zero.  MAC then brk's
for the memory required to do the code assembly, then
sets a pointer to the start of the segment into the
structure member l_rel_f.  This is so an Org instruction
can reset the pointer for code generation correctly.



FILE mac10.c
---- -------

pass1():
   Pass1 is a finite state automaton designed to parse input source
and create intercode records.  An initial symbol is fetched, and
the automaton starts.  Each time end-of-line is recognised, the
parser fetches a new line and new input symbol, and restarts itself.
Parser symbols are scanned linearly until a match occurs.  The
appropriate action is then performed.  If no legal symbol is seen,
the parser will encounter special symbol MCH, and simulate a symbol
match.  This is to allow the error action to be performed.

newline():
   General initialisation subroutine, called to get the next line
of source and fetch the first token.



FILE mac11.c
---- -------

   This file contains the handlers for the pseudo opcodes.  Most of the
opcodes change the program counter or set up special information.
Adding a new pseudo-opcode means changing three tables in mac40.c,
and adding new handlers here.  All of these routines start with 'pr'.



FILE mac20.c
---- -------

pass2():
   This is the second pass of the assembler.  It re-reads intercode
records, and switches to the appropriate code generator, gpcode for
pseudo opcodes, or gocode for opcodes.  This routine also invokes
the listing subroutines - listings are made dynamically as code is
generated, but code dumps and a.out's are made after the assembly.



FILE mac21.c
---- -------

gocode():
   Stacks the values of the argument expressions, selects the
appropriate format descriptor to use, and calls the formatter.

gpcode():
   Switches to the appropriate 'pe' pseudo-opcode routine.



FILE mac22.c
---- -------

   All routines here are analogous to those in mac11.c.



FILE mac23.c
---- -------

plist():
   Listing generator for pseudo opcodes.  Look at a listing
and the way the code works will be obvious.

olist():
   The same as plist(), but for opcodes.

tlist():
   The same as olist(), plist(), but for comment or other lines.



FILE mac24.c
---- -------

sdump():
   Symbol table checker, and output routine.  Dumps the symbol
table to the std. output, and to the a.out file (if any).

cdump():
   Dump code to the std. output, in complete segments.  Generate
a.out's as well (if any).

header():
   Put title and header info on listings.

newpage():
   Call header only if instruction is not eject or title - they
call header as well.

source():
   Print the current source line for listings.  The input source
is re-read in the second pass for listings only.  It is not
rescanned, just printed.



FILE mac30.c
---- -------

expr():
   General expression evaluator, allowing unary ops.

lvalue():
   Get one term (it's value) for expr.



FILE mac31.c
---- -------

This file contains a string comparison subroutine, symbol table
management routines, a string search routine, and a binary to
decimal character conversion routine.



FILE mac32.c
---- -------

This contains a character to binary conversion routine that works
on general C-type numeric constants, a binary search compare routine,
a print routine, like prf1, but with leading zeros.



FILE mac33.c
---- -------

assemble():
   This routine takes the constant 'value' and assembles that into
the next free 'width' bits of memory in the assembly space.
If you are porting MAC to another machine (not an interdata 32 bit)
this routine is probably the one that you will have to change a lot.
Fundamental nasties are:-  the constant WORDSIZ - size of an int in
bits, the macro BITMASK, for generating masks, and the method of
setting up an int of all one bits. (-1).

format():
   Called once per instruction, formats the correct information
as per the format descriptor, for the current instruction.



FILE mac34.c
---- -------

pscan():
   Linear search for a pseudo opcode.

lscan():
   Hashed list search, and insertion routine.  This is the only
routine that can access labels in the symbol table for pass one.
Option LKP looks up a label in symtab.  If it does not exist, an
error is generated. Option DEF defines a label.  Pre-existence
and pre-definition (error) are checked.  Both options return the
index in the struct array for the label.

oscan():
   Binary search of the opcode table (if pscan fails).



FILE mac40.c
---- -------

This is all of the storage declaration for MAC.  A companion file
mac.x contains extern declartions of all of the vars. here.



FILE mac41.c
---- -------

getlin():
   Get one line of source from the input file.

getch():
   Do backslash character mapping.



FILE mac42.c
---- -------

getsym():
   Get next input symbol from source line. (Lexical scanner).



FILE mac43.c
---- -------

   This contains error routines.
   All of these should be self-explanatory.

	M A C T A B   T A B L E  F O R M A T T E R.
	- - - - - -   - - - - -  - - - - - - - - -






   MACTAB is the major companion routine to the MAC cross-assembler.

   It accepts a symbolic description of the target machine, and
produces a formatted binary r-file, suitable for loading by the
cross-assembler.


FILE mactab.h
---- ---------

   This contains a few definitions, and a structure declaration.
The constants NE1, NS1, NS3 describe the sizes of three structures
of information used in generating the parser for MAC.  The implementor
must alter these constants if he/she alters the parser structures.

   The constants NLIT, NSYM, NFMT describe the maximum possible of
literal labels, pre-defined labels and format descriptors respectivly.



FILE mactab0.c
---- ---------

   This is mainly storage declarations and a few initialisations.
This file contains the three pre-defined sections of the parser table.
For a further description - see the last page of these notes.



FILE mactab1.c
---- ---------

   Main program - opens and creates the input/output files, and
loops getting source lines.  It looks for a label and uses that
label as the name of a section that follows.  The address of the
section routine is loaded, and that routine called.
   A somewhat special case is made of end-of-file, and the main
routine returns upon encountering it.



FILE mactab2.c
---- ---------

   Header record description - the same idea as the main routine,
recognising a label, and performing the action associated with it.
Successive if statements are used, as the number of these labels is
small, and each handler is small, ensuring fast operation.
The code is self-explanatory.



FILE mactab3.c
---- ---------

   End section handler - checks for errors, if any - no r-file is
built.  Otherwize, the tables are written in a pre-designated order
to the r-file.



FILE mactab4.c
---- ---------

   Literal section - accepts one label and adds it to the table
of literals thus far defined.  No check is made for a literal
defined more than once.  It will not affect MAC, but it is added
overhead in the scans of this table.



FILE mactab5.c
---- ---------

   Pre-defined labels section - same as literals, but gets a value
for the label, and copies it into a symbol table.  At MAC start time,
these labels are defined in the symbol table for use by the programmer.
Again - no check is made for multiple label definition, but MAC will
pick this up as an error at line zero.  The label is set to GLOBAL,
with a non-relocatable value (ABS).



FILE mactab6.c
---- ---------

   Opcode section - gets symbolic labels, and lists of values to
be associated with that symbolic, and a format descriptor, to be
used when none-other is selected.
   The code is self-explanatory.



FILE mactab7.c
---- ---------

   Format descriptor section - decodes and checks format descriptors
for validity.  Refer to the MACTAB manual for a description of the
format descriptor mechanism.



FILE mactab8.c
---- ---------

   Argument section - this section decodes argument picture declarations,
and generates a finite-state parser table, for use by the pass1 MAC
automaton.  Refer to the last page for a description of the parser makeup.

   Treeinit() generates a dummy node in the tree which is equivalent
to the root node.  While source lines exist - do gentree().  This routine
makes one branch (actually a whole path down the tree) from the source
line description.  The tree is then converted to a linear state table
by the recursive descent routine ptable(), and then the table patched.
These patches are necessary here - they could not be generated by ptable()
without large ammounts of nasty code.  The four sections of code are then
relocated and combined into a parser table.  The four sections are -

	1)  expression parser (a subroutine)
	2)  S1: source line before the arguments.
	3)  s2: parser for arguments (as generated)
	4)  S3: source line after the arguments.

   The remainder of the file consists of the service routines described.



FILE mactab9.c
---- ---------

   General service routines - getlin(): gets one source line from the
input file, getsym(): a lexical scanner, and routines essentially taken
from MAC.

THE MAC PARSER.
--- --- ------

   A description of the way in which the MAC automaton works is
essential in understanding the way in which the parser table is
made.  The user or implementor is urgently requested to read the
outline given, and to have a copy of the code by his/her side when
reading this.

   Take as example the argument pictures:-

		expr
		expr , expr
		# expr


   The parser reads the first picture, and generates the tree

			root
			 /
			/
		       /
		     exp
		     /
		    /
		   /
		 eol




   The tree is built using nodes of structure:

	struct	node	{
		int	n_sym;
		int	n_mem;
		struct	node	*n_alt;
		struct	node	*n_next;
		};

   in such a way that the n_alt pointer points to the
next adjacent descendant of that node, and the n_next
pointer points down the tree (simulating a branch).

   The second line is read, and gentree - realising that the symbol
exp appears on the next lowest level, descends the level and starts
to create alternates as follows:-

			root
			 /
			/
		       /
		     exp
		     / \
		    /   \
		   /     \
		 eol     ','
			   \
			    \
			     \
			     exp
			       \
				\
				 \
				 eol


   The third line generates:-

			root
			 / \
			/   \
		       /     \
		     exp     '#'
		     / \       \
		    /   \       \
                   /     \       \
                 eol     ','     exp
                           \       \
			    \       \
			     \       \
			     exp     eol
			       \
			        \
				 \
				 eol


   Effectivly - when source lines are parsed by MAC, this tree
structure is descended one level for each input symbol.  If the 
symbol does not exist on the current tree level, a syntax error
is reported.  The eol symbol is a nasty special case.  The tree
is generated so that it's leaves are always eol symbols.  These
do not mean an eol symbol must be seen, rather they are patched
later to 'jump' to the section of the parser (pre-defined S3)
that handles the source line after the arguments.

   Ptable would convert this tree into the following table:-

	index		symbol		next
	-----		------		----

	0		exp		3
	1		'#'		10
	2		err

	3		','		6
	4		eol		?	(don't know where to go yet)
	5		err

	6		exp		8
	7		err

	8		eol		?
	9		err

	10		exp		12
	11		err

	12		eol		?
	13		err

   Following the table through you can convince yourself that
it is the same as the tree.  The table is then patched to look like:-

	index		symbol		next		action
	-----		------		----		------

	0		mch		3		call expr
	1		'#'		10		output ','
	2		mch				syntax error

	3		','		6		output ','
	4		mch		14		no op
							(really 'goto S3')
	5		mch				syntax error

	6		mch		8		call expr
	7		mch				syntax error

	8		mch		14		no op
	9		mch				syntax error

	10		mch		12		call expr
	11		mch				syntax error

	12		mch		14		no op
	13		mch				syntax error

   The parser table thus generated is combined with the other three
parts (with the 'n_next' indexes being relocated) and the routine returns.