4BSD/usr/doc/assembler

.TL
Assembler Reference Manual
.AU
John F. Reiser
.AI
.HO
.AU
Robert R. Henry\s-2\u*\d\s+2
.FS
\&\s-2\u*\d\s+2
Preparation of this paper supported in part
by the National Science Foundation under grant MCS # 78-07291.
.FE
.AI
Electronics Research Laboratory
University of California
Berkeley, CA  94720
.ND November 5, 1979
.NH
Introduction
.PP
This document describes the usage and input syntax
of the \s8UNIX VAX\s10-11 assembler \fIas\fP.
\fIAs\fP is designed for assembling the code produced by the
C compiler; certain concessions have been made to handle code written
directly by people, but in general little sympathy has been extended.
This document is intended only for the writer of a compiler or a maintainer
of the assembler.
.NH
Usage
.PP
\fIas\fP is used as follows:
.in +5
as
[ \fB\-LVWJR\fR ]
[ \fB\-d\fIn\fR ]
[ \fB\-DTC\fR ]
[ \fB\-t \fIdirectory\fR ]
[ \fB\-o \fIoutput\fR ]
[ \fIname\d\s-2\&1\s+2\u1\fP ]
[ \fIname\d\s-2\&2\s+2\u ... \fP ]
.br
.in -5
.PP
The \fB\-L\fP flag instructs the assembler to save labels beginning with
a 'L' in the symbol table portion output file.
Labels are not saved by default, as the default action of the link
editor \fIld\fP is to discard them anyway.
.PP
The \fB\-V\fP flag tells the assembler to place its interpass temporary
file into virtual memory.
In normal circumstances, the system manager
will decide where the temporary file should lie.
Our experiments
with very large temporary files show that placing the temporary
file into virtual memory will save about 13% of the assembly time,
where the size of the temporary file is about 350K bytes.
Most assembler sources will not be this long.
.PP
The \fB\-W\fP turns off reporting all errors.
.PP
The \fB\-J\fP flag forces \s-2UNIX\s+2 style pseudo\-branch
instructions with destinations further away than a
byte displacement to be
turned into jump instructions with 4 byte offsets.
The \fB\-J\fP flag buys you nothing if \fB\-d2\fP is set.
(See \(sc 9.4)
.PP
The \fB\-R\fP flag effectively turns \fB.data\fP\fI n\fP
segment changing directives into \fB.text\fP\fI n\fP directives.
This obviates the need to run editor scripts on assembler source
to ``read\-only'' fix initialized data segments.
Uninitialized data (via \fB.lcomm\fP and \fP.comm\fP directives)
is still assembled into the data or bss segments.
.PP
The \fB\-d\fP flag specifies the number of bytes
which the assembler should allow for a displacement when the value of the
displacement expression is undefined in the first pass.
The possible values of \fIn\fP are 1, 2, or 4; the assembler uses 4 bytes
if \fB-d\fP is not specified.
See \(sc 9.2.
.PP
Provided the \fB\-V\fP flag is not set,
the \fB\-t\fP flag causes the assembler to place its single temporary file
in the \fIdirectory\fP instead of in \fI/tmp\fP.
.PP
The \fB\-o\fP flag causes the output to be placed on the named file.
The output of the assembler is by default placed on
the file \fIa.out\fR in the current directory.
.PP
The input to the assembler is normally taken from the standard input.
If file arguments occurs, then the input is taken
sequentially from the files \fIname\d\s-2\&1\s+2\u\fP,
\fIname\d\s-2\&2\s+2\u\fP...
This is not to say that the files are assembled seperately;
\fIname\d\s-2\&2\s+2\u\fP
is effectively concatenated to \fIname\d\s-2\&1\s+2\u\fP,
so multiple definitions cannot occur amongst the input sources.
.PP
The \fB\-D\fP flag enables debugging information, provided that the
assembler has been compiled to have debugging information available.
.PP
The \fB\-T\fP flag enables a trace to be generate of each token read
by \fIas\fP to be printed.  This is long and boring, but useful when
debugging the assembler.
.NH
Lexical conventions
.PP
Assembler tokens include identifiers (alternatively, ``symbols'' or ``names''),
constants, and operators.
.NH
Identifiers
.PP
An identifier consists of a sequence of alphanumeric characters (including
period ``\|\fB.\fR\|'',
underscore ``\(ul'' and
dollar ``\|$\|'')
of which the first may not be numeric.
If the assembler has been compiled to support flexible length symbols,
identifiers may be (practically) arbitrarily long with all
characters significant;
otherwise, only the first NCPS
(a symbol defined in \fI/usr/include/a.out.h\fP, and normally 8)
characters are significant.
.NH 2
Constants
.NH 3
Simple constants
.PP
All integer constants are 64 bits wide and interpreted as two's
complement numbers.
64 bit wide integer constants (quads) are only partially supported
by the \s-2VAX\s+2 hardware, and are supported only to provide
immediate constants to \s-2VAX\s+2 instructions with quad operands.
Floating-point constants are 64 bits wide.
The digits are ``0123456789abcdefABCDEF'' with the obvious values.
.PP
An octal constant consists of a sequence of digits with a leading zero.
.PP
A decimal constant consists of a sequence of digits without a leading zero.
.PP
A hexadecimal constant consists of the characters ``0x'' (or ``0X'')
followed by a sequence of digits.
.PP
A single-character constant consists of a single quote ``\|\(fm\|''
followed by an \s8ASCII\s10 character, including \s8ASCII\s10 newline.
The constant's value is the code for the
given character.
.PP
A floating-point constant consists of the characters ``0f'', ``0d'',
``0F'', or ``0D'' followed by a sequence of characters which \fIatof\fP
will recognize as a floating-point number;
either ``e'', ``E'', ``d''or ``D''
may be used to designate the exponent field.
.NH 3
String Constants
.PP
A string constant is defined using the same syntax and semantics as ``C''
beginning and ending with a ``"'' (double quote).
The \s8DEC\s10 assembler conventions for flexible string quoting is
not implemented.
All ``C'' backslash conventions are observed; the backslash conventions
peculiar to the \s-2PDP\-11\s+2 assembler are not observed.
Strings are known by their value and their length; the assembler
does not implicitly end strings with a null byte.
.NH 2
Operators
.PP
There are several single-character
operators; see \(sc7.
.NH 2
Blanks
.PP
Blank and tab characters
may be interspersed freely between tokens, but may
not be used within tokens (except character constants).
A blank or tab is required to separate adjacent
identifiers or constants not otherwise separated.
.NH 2
Comments
.NH 3
Decadent Comments
.PP
The character ``\|#\|'' introduces a comment, which extends
through the end of the line on which it appears.
Comments starting in column 1,
of the format ``\|# \fIexpression string\fP\|"
are interpreted as an indication that the assembler is now assembling
file \fIstring\fP at line \fIexpression\fP.
Thus, one can use the C preprocessor on an assembly language source file,
and use the \fI#include\fP and \fI#define\fP
preprocessor directives.
(Note that their may not be an assembler comment starting in column
1 if the assembler source is given to the C preprocessor, as it will
be intrepreted by the preprocessor in a way not intended.)
Comments are otherwise ignored by the assembler.
.NH 3
C Style Comments
.PP
The assembler will recognize C style comments, introduced with
the prologue \fB/*\fP and ending with the epilogue \fB*/\fP.
C style comments may extend across multiple lines, and are the preferred
comment style to use if one chooses to use the C preprocessor.
.NH 1
Segments and Location Counters
.PP
Assembled code and data fall into three segments:  the text segment,
the data segment, and the bss segment.  The operating system makes
some assumptions about the content of these segments;  the assembler
does not.  Within the text and data segments there are a number of
sub-segments, distinguished by number (``text 0'', ``text 1'', .\|.\|.
``data 0'', ``data 1'', .\|.\|.\|).
Currently there are four subsegments each in text and data.
The subsegments are for programming convenience only.  Before writing the
output file, the assembler zero-pads each text subsegment to a multiple of four
bytes and then concatenates the subsegments in order to form the text segment;
an analogous operation is done for the data segment.
Requesting that the loader define symbols and storage regions is the only
action allowed by the assembler with respect to the bss segment.
Assembly begins in ``text 0''.
.PP
Associated with each (sub)segment is an implicit location counter which
begins at zero and is incremented by 1 for each byte assembled into the
(sub)segment.  There is no way to explicitly reference a location counter.
Note that the location counters of subsegments other than ``text 0''
and ``data 0'' behave peculiarly due to the concatenation used to form
the text and data segments.
.NH 1
Statements
.PP
A source program is composed of a sequence of
\fIstatements\fP.
Statements are separated either by new-lines
or by semicolons.
There are two kinds of statements: null statements
and keyword statements.
Either kind of statement may be preceded by
one or more labels.
.NH 2
Labels
.NH 3
Name (Global) Labels
.PP
A global label consists of a name followed
by a colon ``\|:\|''.
The effect of a name label is to assign the current
value and type of the location counter
to the name.
An error is indicated in pass 1 if the
name is already defined;
an error is indicated in pass 2 if the
value assigned changes the definition
of the label.
.PP
A global label is referenced by its name.
.PP
Global labels beginning with a ``\|L\|''
are discarded unless the \fB-L\fP option
is in effect.
.NH 3
Numeric (Local) Labels
.PP
A numeric label consists of a digit \fI0\fP to \fI9\fP followed by a
colon (``\|:\|'').
Such a label serves to define temporary symbols of the form
``\fIn\fPb'' and ``\fIn\fPf'',
where \fIn\fP is the digit of the label.
As in the case of name labels, a numeric label assigns
the current value and type of the location counter
to the temporary symbol.
However, several numeric labels with the same digit
may be used within the same assembly.
References to symbols of the form
``\fIn\fPb''
refer to the first numeric label ``\fIn\|:\fP''
\fIb\fP\|ackwards from the reference;
``\fIn\fPf''
symbols refer to the first numeric label ``\fIn\|:\fP''
\fIf\fP\P\|orwards from the reference.
Such numeric labels tend to conserve the inventive powers of
the programmer.
.NH 2
Null statements
.PP
A null statement is an empty statement (which may, however,
have labels).
A null statement is ignored by the assembler.
Common examples of null statements are empty
lines or lines containing only a label.
.NH 2
Keyword statements
.PP
A keyword statement begins with one of the many predefined
keywords of the assembler;
the syntax of the remainder depends
on the keyword.
All instruction opcodes are keywords.
The remaining keywords are assembler pseudo-operations,
also called directives.
The pseudo-operations are listed below with the syntax they require.
.NH 1
Expressions
.PP
An expression is a sequence of symbols representing a value.
Its constituents are identifiers, constants,
operators, and parentheses.
Each expression has a type.
.PP
All operators in expressions are fundamentally binary in
nature.
Arithmetic is two's complement and has 32 bits of precision.
There are four levels of precedence, listed here from
lowest precedence level to highest:
.IP (binary) 16
\|+\|, -\|
.IP (binary) 16
\||\|, \|&\|, \|^\|, \|!\|
.IP (binary) 16
\|*\|, \|/\|, \|%\|, \|!\|
.IP (unary) 16
\|-\|, \|!\|
.PP
All operators of the same precedence are evaluated strictly left to right,
except for evaluation order enforced by parenthesis.
.NH 2
Expression operators
.PP
The operators are:
.IP + 16
addition
.IP \- 16
subtraction
.IP * 16
multiplication
.IP / 16
division
.IP % 
modulo
.IP & 16
bitwise and
.IP \(bv 16
bitwise or
.IP ^ 16
bitwise exclusive or
.IP "> (or >>)" 16
logical right shift
.IP "< (or <<)" 16
logical left shift
.hc
.IP ! 8
\fIa\fR\|!\|\fIb\fR is \fIa \fBor \fR(\|\fBnot \fIb\fR\|);
i.e., the \fBor\fR of the first operand and
the one's complement of the second; most common use is
as a unary operator.
.PP
Expressions may be grouped by use of parentheses ``\|(\|\|)\|''.
.NH 2
Types
.PP
The assembler deals with a number of types
of expressions.  Most types
are attached to keywords and used to select the
routine which treats that keyword.  The types likely
to be met explicitly are:
.IP undefined 8
.br
Upon first encounter, each symbol is undefined.
It may become undefined if it is assigned an undefined expression.
It is an error to attempt to assemble an undefined
expression in pass 2; in pass 1, it is not (except that
certain keywords require operands which are not undefined).
.IP "undefined external" 8
.br
A symbol which is declared \fB.globl\fR but not defined
in the current assembly is an undefined
external.
If such a symbol is declared, the link editor \fIld\fR
must be used to load the assembler's output with
another routine that defines the undefined reference.
.IP absolute 8
.br
An absolute symbol is defined ultimately from a constant.
Its value is unaffected by any possible future applications
of the link-editor to the output file.
.IP text 8
.br
The value of a text symbol is measured
with respect to the beginning of the text segment of the program.
If the assembler output is link-edited, its text
symbols may change in value
since the program need
not be the first in the link editor's output.
Most text symbols are defined by appearing as labels.
At the start of an assembly, the value of ``\|\fB.\fP\|'' is text 0.
.IP data 8
.br
The value of a data symbol is measured
with respect to the origin of the data segment of a program.
Like text symbols, the value of a data symbol may change
during a subsequent link-editor run since previously
loaded programs may have data segments.
After the first \fB.data\fR statement, the value of ``\|\fB.\fP\|''
is data 0.
.IP bss 8
.br
The value of a bss symbol is measured from
the beginning of the bss segment of a program.
Like text and data symbols, the value of a bss symbol
may change during a subsequent link-editor
run, since previously loaded programs may have bss segments.
.IP "external absolute, text, data, or bss" 8
.br
symbols declared \fB.globl\fR
but defined within an assembly as absolute, text, data, or bss
symbols may be used exactly as if they were not
declared \fB.globl\fR; however, their value and type are available
to the link editor so that the program may be loaded with others
that reference these symbols.
.IP register 8
.br
The symbols
.DS
\fBr0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15\fP
\fBap fp sp pc\fP
.DE
are predefined
as register symbols.
In addition, the % operator converts an absolute value to type register.
.IP "other types" 8
.br
Each keyword known to the assembler has a type which
is used to select the routine which processes
the associated keyword statement.
The behavior of such symbols
when not used as keywords is the same as if they were absolute.
.NH 2
Type propagation in expressions
.PP
When operands are combined by expression operators,
the result has a type which depends on the types
of the operands and on the operator.
The rules involved are complex to state but
were intended to be sensible and predictable.
For purposes of expression evaluation the
important types are
.DS
undefined
absolute
text
data
bss
undefined external
other
.DE
The combination rules are then:
If one of the operands
is undefined, the result is undefined.
If both operands are absolute, the result is absolute.
If an absolute is combined with one of the ``other types''
mentioned above,
the result has the other type.
An ``other type'' combined with an explicitly
discussed type other than absolute
it acts like an absolute.
.PP
Further rules applying to particular operators
are:
.IP +
If one operand is text-, data-, or bss-segment
relocatable, or is an undefined external,
the result has the postulated type and the other operand
must be absolute.
.IP \-
If the first operand is a relocatable
text-, data-, or bss-segment symbol, the second operand
may be absolute (in which case the result has the
type of the first operand);
or the second operand may have the same type
as the first (in which case the result is absolute).
If the first operand is external undefined, the second must be
absolute.
All other combinations are illegal.
.PP
.IP others
.br
It is illegal to apply these operators to any but absolute
symbols.
.NH 1
Pseudo-operations (Directives)
.PP
The keywords listed below introduce
influence the later operations of the assembler.
The metanotation
.DS
[ stuff ] .\|.\|.
.DE
means that 0 or more instances of the given stuff may appear.
The metatnotation
.DS
( stuff )\|*\|\|\fIn\fP\|
.DE
means that exactly \fIn\fP occurances of stuff must occur.
.PP
Boldface tokens are literals, italic words
are substitutable.
.PP
The pseudo\-operations listed below are grouped into functional
categories, and not alphabetically.
.NH 2
Interface to a Previous Pass
.in +5m
.NH 3
\&.ABORT
.PP
As soon as the assembler sees this directive, it ignores all further
input (but it does read to the end of file), and aborts the assembly.
No files are created.
It is anticipated that this would be used in a pipe interconnected
version of a compiler, where the first major syntax error would
cause the compiler to issue this directive, saving unnecessary
work in assembling code that would have to be discarded anyway.
.NH 3
\&.file \fIstring\fP
.PP
This directive causes the assembler to think it is in file \fIstring\fP
so error messages reflect the proper source file.
.NH 3
\&.line \fIexpression\fP
.PP
This directive causes the assembler to think it is on line \fIexpression\fP
so error messages reflect the proper source file.
.PP
The only effect of assembling multiple files specified in the command string
is to insert the
\fIfile\fP and \fIline\fP directives, with the appropriate values,
at the beginning of the source from each file.
.NH 3
Preprocessor Interface
.DS
\fI# expression string\fP
\fI# expression\fP
.DE
.PP
This is the only instance where a comment is meaningful to the assembler.
The ``\|#\|''
.ul 1
must
be in the first column.
This meta comment causes the assembler
to believe it is on line \fIexpression\fP.
The second argument, if included, causes the assembler to believe it is in
file \fIstring\fP, otherwise the current file name does not change.
.in -5m
.NH 2
Location Counter Control
.in +5m
.NH 3
\&\fB.align\fP  \fIexpression\fP
.PP
The location counter is adjusted (by assembling bytes containing
zeroes, if necessary) so that the \fIexpression\fP lowest bits
become zero.
Thus ``.align 2'' makes the location counter evenly divisible by 4.
The expression must be defined, absolute, nonnegative,
and less than 16.
(Note that the subsegment concatenation convention
and the current loader conventions may not preserve attempts at aligning
to more than 2 low-order zero bits.)
.NH 3
Subsegment switching
.DS
  \fB.data\fP [ \fIexpression\fP ]
  \fB.text\fP  [ \fIexpression\fP ]
.DE
.PP
These two pseudo-operations cause the
assembler to begin assembling into the indicated text or data
subsegment.  If specified, the expression must be defined and absolute;
an omitted expression is treated as zero.
The effect of a \fB.data\fP directive is treated
as a \fB.text\fP directive if the \fB\-R\fP assembly flag is set.
Assembly starts in the text 0 subsegment.
.NH 3
\&\fB.org\fP  \fIexpression\fP
.PP
The location counter is set equal to the value of the expression.
The expression must be defined.
The value of the expression must be greater than the current value
of the location counter.
.NH 3
\&\fB.space\fP  \fIexpression\fP
.PP
\&\fIexpression\fP bytes of zeroes are assembled.
.in -5m
.NH 2
Initialized Data
.in +5m
.NH 3
Expression Initialized Data
.DS
  \fB.byte		\fIexpression  \fR[  \fB, \fIexpression \fR]  .\|.\|.
  \fB.word		\fIexpression  \fR[  \fB, \fIexpression \fR]  .\|.\|.
  \fB.int		\fIexpression  \fR[  \fB, \fIexpression \fR]  .\|.\|.
  \fB.long		\fIexpression  \fR[  \fB, \fIexpression \fR]  .\|.\|.

  \fB.quad	\fIexpression  \fR[  \fB, \fIexpression \fR]  .\|.\|.

  \fB.float		\fIexpression  \fR[  \fB, \fIexpression \fR]  .\|.\|.
  \fB.double	\fIexpression  \fR[  \fB, \fIexpression \fR]  .\|.\|.
.DE
.PP
The \fIexpression\fP\|s in the comma-separated
list are truncated to the indicated size
(byte=8 bits,
word=16,
int=32,
long=32,
quad=64,
float=32,
double=64)
and
assembled in successive locations.
The expressions must be absolute.
The value assembled in bits 32-63 for \fB.double\fP is zero
if the expression is not of type double.
.PP
Except for \fB.quad\fP, \fB.float\fP and \fB.double\fP,
each expression may optionally be of the form
.DS
  \fIexpression\d\s-21\&\s+2\u\fP  \fB:\fP  \fIexpression\d\s-2\&2\s+2\u\fP.
.DE
In this case the value of \fIexpression\d\s-2\&2\s+2\u\fP
is truncated to \fIexpression\d\s-2\&1\s+2\u\fP
bits and assembled in the next \fIexpr\d\s-2\&1\s+2\u\fP-bit
field which fits in
the natural data size being assembled.
Bits which are skipped because
a field does not fit are made zero.
Thus "\fB.byte\fP 123" is equivalent to
"\fB.byte\fP 8:123" and "\fB.byte\fP 3:1,2:1,5:1"
assembles two bytes, containing the values 9 and 1.
.br
\fBNB:\fP Since no \s-2VAX\s+2 compilers currently use bit fields,
these bit field constructs are liable to disappear in the future.
.NH 3
String Initialized Data
.DS
 \fB.ascii\fP \fIstring\fP [ \fB,\fP \fIstring\fP ]
 \fB.asciz\fP \fIstring\fP [ \fB,\fP \fIstring\fP ]
.DE
.PP
Each \fIstring\fP in the list is assembled into successive locations,
with the first letter in the string being placed
into the first location, etc.
The \fB.ascii\fP directive will not null pad the string;
the \fB.asciz\fP directive will null pad the string.
(Recall that strings are known by their length, and need not be terminated
with a null, and that the C conventions for escaping are understood.)
The \fB.ascii\fP directive is identical to:
.DS
\&\fB.byte\fP \fIstring\d\s-2\&0\s+2\u\fP\fB,\fP \fIstring\d\s-2\&1\s+2\u\fP\fB,\fP ...
.DE
.NH 3
Zero Filled Data
.DS
\fB.space\fP \fIexpression\fP
.DE
.PP
(See \(sc 8.2.4)
\&\fIexpression\fP bytes of zeroes are assembled.
\&\fIexpression\fP must be absolute.
.NH 3
Arbitrarily Filled Data
.DS
\fB.fill\fP
\fIrep_expr\fP\fB, \fP
\fIsize_expr\fP\fB, \fP
\fIvalue_expr\fP\fR
.DE
.PP
All three expressions must be absolute.
\fIvalue_expr\fP,
treated as an expression of size \fIsize_expr\fP bytes,
is assembled and replicated \fIrep_expr\fP times.
The effect is to advance the current location counter
\fIrep_expr\fP \(** \fIsize_expr\fP bytes.
\fIsize_expr\fP must be between 1 and 8.
.in -5m
.NH 2
Symbol Definition
.in +5m
.NH 3
General
.in +5m
.NH 4
\&\fB.comm\fI  name  \fB,  \fIexpression\fR
.PP
Provided the \fIname\fR is not defined elsewhere,
its type is made ``undefined external'', and its value is \fIexpression\fR.
In fact the \fIname\fR behaves
in the current assembly just like an
undefined external.
However, the link editor \fIld\fR has been special-cased
so that all external symbols which are not
otherwise defined, and which have a non-zero
value, are defined to lie in the bss
segment, and enough space is left after the
symbol to hold \fIexpression\fR
bytes.
.NH 4
\&\fB.lcomm\fI  name  \fB,  \fIexpression\fR
.PP
\fIexpression\fP bytes will be allocated in the bss segment and \fIname\fP
assigned the location of the first byte, but the \fIname\fP is not declared
as global and hence will be unknown to the link editor.
.NH 4
\&\fB.globl\fP  \fIname\fP
.PP
This statement makes the \fIname\fR external.
If it is otherwise defined (by \fB.set\fP or by
appearance as a label)
it acts within the assembly exactly as if
the \fB.globl\fR statement were not given;
however, the link editor may be used
to combine this routine with other routines that refer
to this symbol.
.PP
Conversely, if the given symbol is not defined
within the current assembly, the link editor
can combine the output of this assembly
with that of others which define the symbol.
The assembler makes all otherwise
undefined symbols external.
.NH 4
\&\fB.set\fP  \fIname\fP \fB,\fP \fIexpression\fP
.PP
The (\fIname\fP, \fIexpression\fP) pair is entered into the symbol table.
Multiple \fB.set\fP statements with the same name are legal;
the most recent value replaces all previous values.
.in -5m
.NH 3
Debugger Support
.in +5m
.NH 4
\&\fB.lsym\fP  \fIname\fP \fB,\fP \fIexpression\fP
.PP
A unique and otherwise unreferenceable instance of the
(\fIname\fP, \fIexpression\fP)
pair is created in the symbol table.
The Fortran 77 compiler uses this mechanism to pass local symbol definitions
to the link editor and debugger.
.NH 4
Special Symbol Table entries
.DS
\&\fB.stab\fP (\fIexpr\d\s-2i\s+2\u \fB,\fR)\|*NCPS\| \fIexpr\d\s-2\&1\s+2\u\fB,\fP expr\d\s-2\&2\s+2\u\fB,\fP expr\d\s-2\&3\s+2\u\fB,\fP expr\d\s-2\&4\s+2\u\fR
.in +5m
\fR(normal \fBs\fPymbol \fBtab\fPle entry)\fR
.in -5m
\&\fB.stabs\fP \fIstring, expr\d\s-2\&1\s+2\u, expr\d\s-2\&2\s+2\u, expr\d\s-2\&3\s+2\u, expr\d\s-2\&4\s+2\u\fR
.in +5m
\fR(\fBstab s\fPtring)\fR
.in -5m
\&\fB.stabn\fP \fIexpr\d\s-2\&1\s+2\u\fB,\fP expr\d\s-2\&2\s+2\u\fB,\fP expr\d\s-2\&3\s+2\u\fB,\fP expr\d\s-2\&4\s+2\u\fR
.in +5m
\fR(\fBstab n\fPone)\fR
.in -5m
\&\fB.stabd\fP \fIexpr\d\s-2\&1\s+2\u\fB,\fP expr\d\s-2\&2\s+2\u\fB,\fP expr\d\s-2\&3\s+2\u\fR
.in +5m
\fR(\fBstab d\fPot)\fR
.in -5m
.DE
.PP
The \fIstab\fP directives place symbols in the symbol table for the symbolic
debugger, \fIsdb\fP\s-2\u*\d\s+2.
.FS
.in +5
.ti -5
\s-2\u*\d\s+2Katseff, H.P. \fISdb: A Symbol Debugger\fP.
Bell Laboratories, Holmdel,
NJ.  April 12, 1979.
.br
.ti -5
\&Katseff, H.P. \fISymbol Table Format for Sdb\fP. File 39394,
Bell Laboratores, Holmdel, NJ. March 14, 1979.
.in -5
.FE
In the \fB.stab\fP directive,
the first NCPS expressions are used for the
symbol name, which may be zero.
The \fB.stab\fP directive makes no sense if
the assembler recognizes arbitrary length symbols;
if so, the assembler complains.
The \fIstring\fP in the \fB.stabs\fP
directive more generally serves the same purpose as the NCPS expressions.
If the symbol name is zero, the
\&\fB.stabn\fP directive may be used instead.
.PP
The other expressions are stored in the name list structure in the symbol
table and preserved by the loader for reference by \fIsdb\fP\fR;
the value of the expressions are peculiar to formats required by \fIsdb\fP\fR.
.in +5m
.ti -5
\&\fIexpr\d\s-2\&1\s+2\u\fP is used as a symbol table tag
(nlist field \fIn_type\fP).
.br
.ti -5
\&\fIexpr\d\s-2\&2\s+2\u\fP seems to always be zero
(nlist field \fIn_other\fP).
.br
.ti -5
\&\fIexpr\d\s-2\&3\s+2\u\fP is used for either the
source line number, or for a nesting level
(nlist field \fIn_desc\fP).
.br
.ti -5
\fIexpr\d\s-2\&4\s+2\u\fR is used as tag specific information
(nlist field \fIn_value\fP).
In the
case of the \fB.stabd\fP directive, this expression is nonexistant, and
is taken to be the value of the location counter at the following instruction.
Since there is no associated name for a \fB.stabd\fP directive, it can
only be used in circumstances where the name is zero.
The effect of a \fB.stabd\fP directive can be achieved by one of the other
\&\fB.stab\fPx directives in the following manner:
.in -5m
.DS
\&	\fB.stabs\fP \fIstring\fB,\fP expr\d\s-2\&1\s+2\u\fB,\fP expr\d\s-2\&2\s+2\u\fB,\fP expr\d\s-2\&3\s+2\u\fB,\fP \fP LL\fIn\fP
LL\fIn\fP\fB:\fP
.DE
The \fB.stabd\fP directive is prefered, because it does not clog the symbol
table with labels used only for the stab symbol entries.
.in -5m
.in -5m
.NH 1
Machine instructions
.PP
The syntax of machine instruction statements accepted by \fIas\fP
is generally similar to the syntax of \s8DEC MACRO\s10-32.  There are
differences, however.
.NH 2
Character set
.PP
\fIas\fP uses the character `$' instead of `#',
and the character `*' instead of `@'.  Opcodes and register names
are spelled with lower-case rather than upper-case letters.
.NH 2
Lengths
.PP
Under certain circumstances, the following constructs are (optionallly)
recognized by \&\fIas\fP to indicate the number of bytes to allocate
for unresolved expressions used to specify displacement or indirect
displacement addressing modes:
.DS
\&\fBB^\fP or	\fBB\`\fP	to indicate byte lengths		(1 byte)
\&\fBW^\fP or	\fBW\`\fP	to indicate word lengths		(2 bytes)
\&\fBL^\fP or	\fBL\`\fP	to indicate long word lengths	(3 bytes)
.DE
One can also use lower case \fBb\fP, \fBw\fP or \fBl\fP instead of the upper
case letters.
There must be no space between the size specifier letter and the \fB^\fP or
\&\fB\`\fP.
The constructs \fBS^\fP and \fBG^\fP are not recognized
by \fIas\fP as they are by the \s-2DEC\s+2 assembler.
It is preferred to use the "\`" displacement specifier,
so that the ``^'' is not
misinterpreted as the \fBxor\fP operator.
.PP
Literal values (including floating-point literals used where the
hardware expects a floating-point operand) are assembled as short
literals if possible, hence not needing the \fBS^\fP \s-2DEC\s+2
directive.  If the value of the displacement is known exactly in the
first pass \fIas\fP determines the length automatically, assembling it
in the shortest possible way, ignoring (if present) the length
expression.  If the value of the displacement is not known in the first
pass, \&\fI\fP will use the value of the displacement given by the
optional length specifier, or will use the value specified by the
\fB\-d\fP argument, or will default to 4 bytes.
.NH 2
CASE instructions
.PP
\fIas\fP considers the instructions \fBcaseb\fP, \fBcasel\fP, \fBcasew\fP
to have three operands (namely: selector, base, limit).
The displacements must be explicitly assembled using one
or more \fB.word\fP statements.
.NH 2
Extended branch instructions
.PP
These opcodes (formed in general
by substituting a ``j'' for the initial ``b''
of the standard opcodes)
take as branch destinations the name of a label in the current
subsegment.  If the destination is close enough then the corresponding
``b'' instruction is assembled.  Otherwise the assembler choses a sequence
of one or more instructions which together have the same effect as if the
``b'' instruction had a larger span.  In general, \fIas\fP chooses the
inverse branch followed by a \fBbrw\fP, but a \fBbrw\fP
is sometimes pooled among several ``j'' instructions with the same
destination.
If the \fB\-J\fP assembler option is given,
a \fBjmp\fP instruction is used instead of a \fBbrw\fP instruction
for \fBALL\fP (!!) ``j'' instructions with distant destinations.
This makes assembly of large (>32K bytes) assembly programs (inefficiently)
possible.
The current assembler does not try to use clever combinations of \fBbrb\fP,
\fBbrw\fP and \fBjmp\fP instructions.
The \fBjmp\fP instructions use PC relative addressing, with
the length of the offset given by the ``\fB\-d\fP'' assembler
option.
.KS
.DS
.ft B
.ta 1.0i 2.0i 3.0i
jeql	jeqlu	jneq	jnequ
jgeq	jgequ	jgtr	jgtru
jleq	jlequ	jlss	jlssu
jbcc	jbsc	jbcs	jbss
jlbc	jlbs
jcc	jcs	
jvc	jvs
jbc	jbs
jbr
.DE
.KE
\fBjbr\fR turns into \fBbrb\fR
if its target is close enough; else a \fBbrw\fP is used.
.NH 1
Diagnostics
.PP
Diagnostics are intended to be self explanatory and appear on
the standard output.
.NH 1
Limits
.DS
.ta	2.0i
Arbitrary	Files to assemble
Arbitrary	Significant characters per name
127	Characters per input line
127	Characters per string
Arbitrary	Symbols
4	Text segments
4	Data segments
.DE