.pl 72 .ND .sp 5 .TL MACTAB .sp Multiple Assembly-language Compiler .br Table Formatter. .sp 2 .AU Ross Nealon .sp .AI University of Wollongong. .bp .NH 1 GENERAL DESCRIPTION. .PP MACTAB is one of the companion programs to the MAC cross-assembler. Its purpose is to aid the user in the formatting and production of a description file for the MAC cross-assembler. .PP This file is expected to be an absolute binary data file, containing a concise description of the target machines architecture, the desired format of the assembly-language source line, and other necessary details. .PP All error messages are reported to the standard output, and are flagged with the line number of the description line in error. Any error will abort production of the description file. .bp .NH 1 TERMINOLOGY. .IP "r-file: " 12 An r-file is a file containing a formatted description of a machine for the MAC cross-assembler. This is MACTAB's only useful output. .IP "opcode: " 12 An opcode symbolic is the symbolic label chosen to represent a particular instruction. Thus - 'sub' may be chosen to mean the subtraction instruction. .IP "class: " 12 The opcode class is a number, which describes to which category of instructions this particular symbolic belongs. Classes may represent addressing modes, or instruction types, such as branches, loads and stores. .IP "descriptor: " 12 A format descriptor is a concise description of a binary instruction, giving the location of the argument fields within the instruction, and the width of the fields. There must exist one format descriptor for each possible binary instruction format. .IP "picture: " 12 The argument picture is the part of the source line after the symbolic opcode. The picture contains expressions, which are evaluated to form arguments to the instruction being assembled. The picture can be used to recognise different classes of opcodes. Thus - '#123' might mean immediate mode, while '123' might mean the contents of address 123. .IP "labels: " 12 Labels are defined as standard MAC labels, consisting of at least one alpha character, from the set {a-z . _ @} and can optionally be followed by one to seven alphanumerics, from the set {a-z . _ @ 0-9}. MACTAB allows the user to pre-define labels, that is - to assign a value to a label. This label thereafter cannot be redefined, and has its assigned value in any expression. .IP "literals: " 12 Literals are special labels, that are defined as reserved, and have no value. Literals cannot be used in expressions, but are useful for recognising different argument pictures. Thus, if 'x' is defined as literal, the picture 'expr,x' would indicate that an expression then comma then literal 'x' are required in that order for a succesful parse of that source line. .bp .NH 1 SYNOPSIS. .PP mactab source r-file .sp 2 .IP " source: " 12 Input description source code. .IP " r-file: " 12 Output filename. .sp 2 .PP Both arguments are required. MACTAB reads from 'source' until end-of-file or the 'end' section is encountered. MACTAB then collects individually compiled sections, and writes them to the file 'r-file'. Any error will abort production of the r-file, but will continue scan of source. .sp 3 .NH 1 SOURCE FORMAT. .PP The r-file source description is made up of several distinct parts, two of which are optional. These parts are called SECTIONS. The order of the sections is not critical, but some sections require that other sections be previously defined. It is suggested that the user follow the given section ordering and internal arrangement. .PP A section has the general form:- .sp section-name .br . .br . .br <description> .br . .br . .br % .sp 2 .PP The '%' at the end indicates end-of-section. The section-names and end-of-section character must begin in column one of the source line. .bp .NH 1 HEADER SECTION. .PP The header section generally describes the target machine's architecture to MAC. This section produces the header record at the beginning of each r-file. .PP MACTAB allows the definition of special pseudo opcodes for the definition of constants during the assembly process. These opcodes, called Define Constant (dc) opcodes, can be given one extra identifying character, appended to the 'dc'. .IP " e.g.-" 16 dc4 dcf dc. .br dcb dch dc@ .PP The identifying character must be a legal alphanumeric, so that the whole dc opcode forms a legal MAC label. .sp 2 .PP Each dc can be constructed to define constants of particular length and/or format. Thus - 'dcb' may be declared to define one byte of storage with a constant in it, 'dch' to define a half-word constant, 'dcw' to define a full-word constant. The source line required is:- .IP "" 12 dc <character> <format> .PP The required <format> field is described in detail later in this manual. Please refer to the FORMATS section for this description. .sp 2 .PP MACTAB and MAC allow the user to define a default dc, which has no identifying character. The source line required is:- .IP "" 12 defmt <format> .PP Where <format> is as described later. .sp 2 .PP Program counter incrementation is important for addressing relative to the program counter. The pc can be pre-incremented (before an instruction is executed) or post-incremented (after an instruction is executed). Declaration is as follows:- .IP "" 12 pc post OR .br pc pre .bp .PP The assembler requires the width of the basic address unit (byte) in bits. Declaration is:- .IP "" 12 byte <width> .PP Where <width> is a constant (numeric) with a value equal to the byte width. .sp .PP Similarly - the assembler needs to know the number of bytes per word. It's definition is:- .IP "" 12 word <length> .PP Where <length> is the number of bytes per word. .sp 2 .PP The user is requested to supply some opcode value that is treated as illegal by the target machine. If none is supplied - zero (0) is assumed. The declaration is:- .IP "" 12 ii <value> .sp 2 .PP MAC allows the user to supply a string of up to thirty characters that is printed at the head of each page generated by MAC (listings and so on). This is optional. .IP "" 12 mac "Title string" .sp 2 .PP MAC allows the user to give a page length in lines for listings and so on. This is specified as a number of lines. The default is 60. .IP "" 12 page <lines> .bp .PP A typical declaration is illustrated below. The format descriptors on the define constant lines are described later. .IP "" 12 .sp header .br pc post .br byte 8 .br word 2 .br dc b a:8 .br dc f a:16 .br defmt a:8 .br mac "Dummy machine" .br page 60 .br ii 0xff .br % .bp .NH 1 LITERALS SECTION. .PP The literals section allows definition of the special class of labels called literals. Literals are standard MAC labels that have been defined as reserved, and have no value assigned to them. Literals cannot be used in expressions, but are useful in argument pictures for recognising different pictures. A literal cannot be redefined as a label with a value. A typical definition may be:- .IP "" 12 .br literals .br x .br y .br a .br % .sp .PP The literals section (if needed) MUST appear before the args section. This section is optional and may be totally omitted if desired. .sp 3 .NH 1 LABELS SECTION. .PP This is the only other optional section. This section allows the user to pre-define up to sixteen labels and assign them values. These labels will always be defined to all users of the r-file, and so provide a mechanism for remembering frequently used addresses or values, such as subroutines in the monitor Read-Only-Memory. The labels are defined to have non-relocatable values. This section may appear anywhere in the source description. The definition is:- .IP "" 12 label <value> .PP Where <value> is a legal MAC constant. A typical section definition is:- .IP "" 12 .sp labels .br adc 4 .br nul 0 .br tty 0xc0fe .br mask 0777 .br % .bp .NH 1 FORMATS SECTION. .PP This section describes the format of the binary instruction (or data for a define constant) to MAC. Generally - each class of instructions (zero page, pc relative, register indexed etc) will have a seperate format descriptor. Each descriptor can define one instruction or data format. .PP Each descriptor is made up of field descriptors, called subset descriptors. Each subset is defined as follows:- .IP "" 12 <subset-name>:<subset-width> .sp .PP Each subset name is one identifying character. The letter 'o' means opcode value, '!' means the current value of the program counter, a '#' means a constant will follow, and the letters 'a' to 'm' mean the values of argument expressions one to thirteen respectivly. .PP Pre-fixing the subset names 'a' to 'm' with the letter 'p' indicates that this argument should be assembled program counter relative. That is - the value assembled is the value of the expression minus the current value of the location counter. Pre-fixing the subset names 'a' to 'm' with the special field 'r<n>' where <n> is a decimal constant, implies that the argument expression should be assembled with the most significant <n> (number) of bits in the expressions value swapped with the least significant <n> bits. This format prefix assumes that the maximum length of the expression's value will be two times the constant <n>. .PP The name 'o' implies assemble the value of the current opcode being assembled here. Then 'o:8' implies that the opcode value should be assembled in the next 8 bits of memory. Thus 'a:16' means assemble argument 'a' (argument one) in the next 16 bits, '#123:16' means assemble the decimal constant 123 in the next 16 bits, 'pa:8' means argument one made pc relative in the next 8 bits, 'r8a:16' means assemble argument one in 16 bits, with the first 8 bits and the next 8 bits swapped. .PP The full descriptor is made up of several subset descriptors grouped together. .IP "" 12 .sp |------------------------------| .br | opcode | arg 1 | arg 2 | .br |------------------------------| .br 8 8 16 (bits) .sp 2 The above instruction may be described as:- .sp o:8a:8b:16 .bp .PP Each descriptor must be prefixed with a number indicating the number of arguments to this instruction, blanks/tabs, then the format descriptor. .IP " e.g.- " 12 2 o:8a:8b:16 .sp .PP A typical definition may be:- .IP "" 12 .br formats .br 0 o:8 .br 2 o:8b:4a:12 .br 1 #0xf:4o:4a:8 .br 1 o:6a:12 .br 1 a:4o:4#0x12:8 .br 1 o:8r4a:8 .br 1 o:8pa:8 .br % .sp 2 .PP MACTAB scans each format descriptor for validity, and reports and inconsistencies. The total width of the format descriptor in bits must be an even multiple of the width of the basic address unit (byte). For this reason, the header section must be defined before the formats section. .PP Each descriptor is assigned a number, starting at zero (0) and being incremented by one for each new descriptor. These logical numbers are the only method of referring to a format descriptor. .bp .NH 1 OPCODES SECTION. .PP This section is used to describe all of the possible opcode symbolics (except the pseudo opcodes which are always defined) and the values of each symbolic. The MAJOR CLASS of opcodes must first be defined. This is the maximum number of different values any opcode symbolic may have. Each different argument picture may be used to select a new opcode class (a particular value out of a class of values). Thus - the construct .sp .DS 'sub expression' .DE .sp .PP may select a value from a list for the 'sub' instruction, while the construct .sp .DS 'sub #expression' .DE .sp .PP may be set to select another value from the same list for the 'sub' instruction. The table may be laid out in columns, each column corresponding to a class, and each row corresponding to an opcode symbolic. .IP "" 8 .br opcodes .br class <n> .br add 1 value<0> value<1> . . . value<n-1> .br sub 1 value<0> value<1> . . . value<n-1> .br jmp 2 value<0> value<1> . . . value<n-1> .br % .sp .PP This example may have class<0> values defined for zero page addressing, class<1> for immediate mode, and so on. A logical connection can be made between columns and argument pictures. The normal practice is to allow one column for each type of argument picture. The single number adjacent to the opcode symbolics is the number of a format descriptor to use for this instruction if no other is selected. (See args section for the method of selecting classes of opcode values and format descriptors depending on the argument pictures.) .bp .PP If a class of opcode values (a particular argument format) is desired to be made illegal for an opcode symbolic, the user must make the opcode value for that symbolic and class equal to the selected illegal instruction value defined in the header section. Zero is the default. If MAC is requested to assemble an instruction with an illegal opcode value, it will reject the source line and terminate in error. As an example, the 'sub' symbolic may be allowed to have class 1, 2 and 4 argument pictures, but not classes 0, 3 and 5. The declaration may be:- .IP "" 8 sub 1 0 0x10 0x20 0 0x40 0 .sp .PP Up to 256 different opcode symbolics may be defined, with no limit on the maximum class size. .PP To make this section a little more understandable, The reader is referred to the two appendicies, wherein he will find r-file descriptions for the Motorola 6800 and the 6502 micro-computers. Careful study of these should lead to a better understanding of the opcode section. .bp .NH 1 ARGS SECTION. .PP This section is the most important section of the six sections of the table formatter. This section takes the user's description of the argument pictures and using a top-down recursive descent algorithm builds a finite-state parser table for the cross assembler. Each picture is made up of literals, the reserved keyword 'expr' meaning an expression, delimiters (commas and so on) and some required characters, such as a '#' meaning immediate mode, '$' meaning zero-page addressing, '@' meaning indexed. These characters are 'required' in the sense that they must be present in the argument picture for MAC to recognise that format of picture. .PP MACTAB recognises the keyword 'expr', and where-ever it occurs, MACTAB substitutes a call to an expression parser, that is pre-defined. .PP The user can associate a series of actions to perform upon recognition of an argument picture. The actions currently implemented are:- .IP " 1)" 8 Select a new format descriptor .IP " 2)" 8 Select a new class of opcode values for this symbolic .PP Typical argument pictures could be:- .IP "" 12 .br expr .br expr , x .br expr , ( y ) .br # expr .br a , expr .br ( $ expr ) , y .sp .PP Selection of actions is by addition of four constants after the picture but on the same source line. The first describes which action to take (if any). Each bit in the constant means a particular argument. If a bit is set, MAC will try to perform the requested action. The remaining three constants are arguments to the actions, specifying such things as which new format descriptor, and which new opcode class. If the bits are numbered 0 as the least-significant bit (right-most), then if bit zero is set, MAC will select a new format descriptor, and assume the fourth argument constant will be the new format descriptor. If bit one is set, MAC will select a new class of opcode values, that class being the value of the third constant. Selection actions are optional. The actions are selected by preceeding the four constants with a brace '{'. All four constants are needed after the brace. Constants should be seperated by blanks and/or tabs. .bp .PP Several pictures should always be defined by the user. The picture " " (a string) should be defined, for the dc and title pseudo opcodes. A blank line is equivalent to the case of an instruction having no argument picture. .PP The special case of a label in an argument picture may appear, as this indicates that a label (not an expression) is required here. See appendix 1 for examples of real definitions. .sp 5 .NH 1 END SECTION. .PP The end section is needed to actually create the named r-file. Upon recognition of this section, MACTAB collects the compiled sections, and writes them in their correct order onto the r-file. Any errors except the end section missing will cause the r-file not to be produced. If an end-of-file is encountered before the end section, MACTAB reports this as a warning only, and assumes an end section. In this case only, an r-file will be produced. No terminating '%' is required for this section. .bp