SysIII/usr/src/man/docs/eqn_sys
.ds :? Typesetting Mathematics
.de PT
.lt \\n(LLu
.pc %
.nr PN \\n%
.if \\n%-1 .if o .tl '\s9\f2\*(:?\fP''\\n(PN\s0'
.if \\n%-1 .if e .tl '\s9\\n(PN''\f2\*(:?\^\fP\s0'
.lt \\n(.lu
..
.tr _\(em
.tr *\(**
.de UC
\&\\$3\s-1\\$1\\s0\&\\$2
..
.de IT
.if n .ul
\&\\$3\f2\\$1\fP\&\\$2
..
.de UL
.if n .ul
\&\\$3\f3\\$1\fP\&\\$2
..
.de BD
.bd 3 \\$1
..
.de BI
.bd I \\$1
..
.de P1
.DS I 3n
.nf
.if n .ta 5 10 15 20 25 30 35 40 45 50 55 60
.if t .ta .4i .8i 1.2i 1.6i 2i 2.4i 2.8i 3.2i 3.6i 4i 4.4i 4.8i 5.2i 5.6i
.if t .tr -\(mi|\(bv'\(fm^\(no*\(**
.tr `\(ga'\(aa
.if t .tr _\(ul
.		\"use first argument as indent if present
..
.de P2
.if n .ls 2
.tr --||''``^^!!
.if t .tr _\(em
.DE
..
.hw semi-colon
.hw estab-lished
.		\"2=not last lines; 4= no -xx; 8=no xx-
.		\"special chars in programs
.ds . \s\\nP.\s0
.ds , \s\\nP,\s0
.ds ; \s\\nP\z,\v'-.3m'.\v'.3m'\s0
.ds : \s\\nP\z.\v'-.3m'.\v'.3m'\s0
.ds ' \s\\nQ\v'.25m'\(fm\v'-.25m'\s0
.ds ^ \h'-.1m'\(no\h'.1m'
.de WS
.sp \\$1
..
.hy 14
.nr PS 9
.nr VS 11
'\"	ND "Revised  April, 1977"
.EQ
delim $$
gsize 9
.EN
....TR 17
.TL
A System for Typesetting Mathematics
.AU
Brian W. Kernighan
.AU
Lorinda L. Cherry
.AI
.MH
.AB
.nr PS 9
.nr VS 11p
.PP
This paper describes the design and implementation
of a system for typesetting mathematics.
The language has been designed to be easy to learn
and to use
by people
(for example, secretaries and mathematical typists)
who know neither mathematics nor typesetting.
Experience indicates that the language can
be learned in an hour or so,
for it has few rules and fewer exceptions.
For typical expressions,
the size and font changes, positioning, line drawing,
and the like necessary to print according to mathematical conventions
are all done automatically.
For example,
the input
.sp 4p
.ce
sum from i=0 to infinity x sub i = pi over 2
.sp 4p
produces
.EQ
sum from i=0 to infinity x sub i = pi over 2
.EN
.PP
The syntax of the language is specified by a small
context-free grammar;
a compiler-compiler is used to make a compiler
that translates this language into typesetting commands.
Output may be produced on either a phototypesetter
or on a terminal with forward and reverse half-line motions.
The system interfaces directly with text formatting programs,
so mixtures of text and mathematics may be handled simply.
.LP
.LP
.PP
This paper is a revision of a paper originally published in
CACM, March, 1975.
.AE
.2C $gsize 9$
.NH
Introduction
.PP
``Mathematics is known in the trade as
.ul
difficult,
or
.ul
penalty, copy
because it is slower, more difficult,
and more expensive to set in type
than any other kind of copy normally
occurring in books and journals.''
[1]
.PP
One difficulty with mathematical text
is the multiplicity of characters,
sizes, and fonts.
An expression such as
.EQ
lim from {x-> pi /2} ( tan~x) sup{sin~2x}~=~1
.EN
requires an intimate mixture of roman, italic and greek letters, in three sizes,
and a special character or two.
(``Requires'' is perhaps the wrong word,
but mathematics has its own typographical conventions
which are quite different from those
of ordinary text.)
Typesetting such an expression by traditional methods
is still an essentially manual operation.
.PP
A second difficulty is the two dimensional character
of mathematics,
which the superscript and limits in the preceding example
showed in its simplest form.
This is carried further by
.EQ
a sub 0 + b sub 1 over
  {a sub 1 + b sub 2 over
    {a sub 2 + b sub 3 over
      {a sub 3 + ... }}}
.EN
.sp
and still further by
.EQ
define emx "{e sup mx}"
define mab "{m sqrt ab}"
define sa "{sqrt a}"
define sb "{sqrt b}"
int dx over {a emx - be sup -mx} ~=~
left { lpile {
     1 over {2 mab} ~log~ {sa emx - sb} over {sa emx + sb}
   above
     1 over mab ~ tanh sup -1 ( sa over sb emx )
   above
     -1 over mab ~ coth sup -1 ( sa over sb emx )
}
.EN
These examples also show line-drawing, built-up characters like braces and radicals,
and a spectrum of positioning problems.
(Section 6 shows
what a user has to type to produce these
on our system.)
.NH
Photocomposition
.PP
Photocomposition techniques
can be used to solve some of the problems of typesetting mathematics.
A phototypesetter is a device which exposes
a piece of photographic paper or film, placing characters
wherever they are wanted.
The Graphic Systems phototypesetter[2] on the
.UX
operating
system[3] works by shining light through a character stencil.
The character is made the right size by lenses,
and the light beam directed by fiber optics
to the desired place on a piece of photographic paper.
The exposed paper is developed and typically used
in some form of photo-offset reproduction.
.PP
On
.UX ,
the phototypesetter is driven by a formatting program called
.UC TROFF
[4].
.UC TROFF
was designed for setting running text.
It also provides all of the facilities that one needs for
doing mathematics, such as
arbitrary horizontal and vertical motions,
line-drawing, size changing,
but the syntax for describing these special operations is
difficult to learn,
and difficult even for experienced users to type correctly.
.PP
For this reason we decided to use
.UC TROFF
as an ``assembly language,''
by
designing a language for describing mathematical
expressions,
and compiling it into
.UC TROFF .
.NH
Language Design
.PP
The fundamental principle upon which we based our language design
is that the language should be easy to use
by people (for example, secretaries) who know neither mathematics nor typesetting.
.PP
This principle implies
several things.
First,
``normal'' mathematical conventions about operator precedence,
parentheses, and the like cannot be used,
for to give special meaning to such characters means
that the user has to understand what he or she
is typing.
Thus the language should not assume, for instance,
that parentheses are always balanced,
for they are not in
the half-open interval $(a,b]$.
Nor should it assume that
that $sqrt{a+b}$ can be replaced by
$(a+b) sup roman \(12$,
or that $1/(1-x)$ is better written as $1 over 1-x$
(or
vice versa).
.PP
Second, there should be relatively few rules,
keywords,
special symbols and operators, and the like.
This keeps the language easy to learn and remember. Furthermore, there should be few exceptions to
the rules that do exist:
if something works in one situation,
it should work everywhere.
If a variable can have a subscript,
then a subscript can have a subscript, and so on without limit.
.PP
Third, ``standard'' things should happen automatically.
Someone who types ``x=y+z+1'' should get ``$x=y+z+1$''.
Subscripts and superscripts should automatically
be printed in an appropriately smaller size,
with no special intervention.
Fraction bars have to be made the right length and positioned at the
right height.
And so on.
Indeed a mechanism for overriding default actions has to exist,
but its application is the exception, not the rule.
.PP
We assume
that the typist has a reasonable picture
(a two-dimensional representation)
of the desired final form, as might be handwritten
by the author of a paper.
We also assume that
the input is typed on a computer terminal much like an ordinary typewriter.
This implies an input alphabet
of perhaps 100 characters,
none of them special.
.PP
A secondary, but still important, goal in our design
was that the system should be easy to implement,
since neither of the authors had any desire to make
a long-term project of it.
Since our design was not firm,
it was also necessary that the program be easy to change
at any time.
.PP
To make the program easy to build and to change,
and to guarantee regularity
(``it should work everywhere''),
the language is defined by a
context-free grammar, described in Section 5.
The compiler for the language was built using a compiler-compiler.
.PP
A priori,
the grammar/compiler-compiler approach seemed the right thing to do.
Our subsequent experience leads us to believe
that any other course would have been folly.
The original language was designed in a few days.
Construction of a working system
sufficient to try significant examples
required perhaps a person-month.
Since then, we have spent a modest amount of additional time
over several years
tuning, adding facilities,
and occasionally changing the language as users
make criticisms and suggestions.
.PP
We also decided quite early that
we would let
.UC TROFF
do our work for us whenever possible.
.UC TROFF
is quite a powerful program, with
a macro facility, text and arithmetic variables, numerical computation and testing,
and conditional branching.
Thus we have been able to avoid writing
a lot of mundane but tricky software.
For example, we store no text strings,
but simply pass them on to
.UC TROFF .
Thus we avoid having to write a storage management package.
Furthermore, we have been able to isolate ourselves
from most details of the particular device and character set
currently in use.
For example, we let
.UC TROFF
compute the widths of all strings of characters;
we need know nothing about them.
.PP
A third design goal is special to our environment.
Since our program is only useful for typesetting mathematics,
it is necessary that it interface cleanly with the underlying typesetting language
for the benefit of users
who want to set intermingled mathematics and text
(the usual case).
The standard mode of operation
is that when a document is typed,
mathematical expressions are input as part of the text,
but marked by user settable delimiters.
The program reads this input and treats as comments
those things which are not mathematics,
simply passing them through untouched.
At the same time it converts the mathematical input
into the necessary
.UC TROFF
commands.
The resulting ioutput is passed directly to
.UC TROFF
where the comments and the mathematical parts both become
text and/or
.UC TROFF
commands.
.NH
The Language
.PP
We will not try to describe the language precisely here;
interested readers may refer to the appendix for more details.
Throughout this section, we will write expressions
exactly
as they are handed to the typesetting program (hereinafter called
.UC ``EQN'' ),
except that we won't show the delimiters
that the user types to mark the beginning and end of the expression.
The interface between
.UC EQN
and
.UC TROFF
is described at the end of this section.
.PP
As we said, typing x=y+z+1 should produce $x=y+z+1$,
and indeed it does.
Variables are made italic, operators and digits become roman,
and normal spacings between letters and operators are altered slightly
to give a more pleasing appearance.
.PP
Input is free-form.
Spaces and new lines in the input are used by
.UC EQN
to separate pieces of the input;
they are not used to create space in the output.
Thus
.P1
x    =    y
   + z + 1
.P2
also gives $x=y+z+1$.
Free-form input is easier to type initially;
subsequent editing is also easier,
for an expression may be typed as many short lines.
.PP
Extra white space can be forced into the output by several
characters of various sizes.
A tilde ``\|~\|'' gives a space equal
to the normal word spacing in text;
a circumflex gives half this much,
and a tab character spaces to the next tab stop.
.PP
Spaces (or tildes, etc.)
also serve to delimit pieces of the input.
For example, to get
.EQ
f(t) = 2 pi int sin ( omega t )dt
.EN
we write
.P1
f(t) = 2 pi int sin ( omega t )dt
.P2
Here spaces are
.ul
necessary
in the input
to indicate that
.ul
sin, pi, int,
and
.ul
omega
are special, and potentially worth special treatment.
.UC EQN
looks up each such string of characters
in a table, and if appropriate gives it a translation.
In this case,
.ul
pi
and
.ul
omega
become their greek equivalents,
.ul
int
becomes the integral sign
(which must be moved down and enlarged so it looks ``right''),
and
.ul
sin
is made roman, following conventional mathematical practice.
Parentheses, digits and operators are automatically made roman
wherever found.
.PP
Fractions are specified with the keyword
.ul
over:
.P1
a+b over c+d+e = 1
.P2
produces
.EQ
a+b over c+d+e = 1
.EN
.PP
Similarly, subscripts and superscripts are introduced by the keywords
.ul
sub
and
.ul
sup:
.EQ
x sup 2 + y sup 2 = z sup 2
.EN
is produced by
.P1
x sup 2 + y sup 2 = z sup 2
.P2
The spaces after the 2's are necessary to mark the end of
the superscripts;
similarly the keyword
.ul
sup
has to be marked off by spaces or
some equivalent delimiter.
The return to the proper baseline is automatic.
Multiple levels of subscripts or superscripts
are of course allowed:
``x\|\|sup\|\|y\|\|sup\|\|z'' is
$x sup y sup z$.
The construct
``something
.ul
sub
something
.ul
sup
something''
is recognized as a special case,
so
``x sub i sup 2''
is
$x sub i sup 2$ instead of ${x sub i} sup 2$.
.PP
More complicated expressions can now be formed with these
primitives:
.EQ
{partial sup 2 f} over {partial x sup 2} =
x sup 2 over a sup 2 + y sup 2 over b sup 2
.EN
is produced by
.P1
.ce 0
   {partial sup 2 f} over {partial x sup 2} =
   x sup 2 over a sup 2 + y sup 2 over b sup 2
.P2
Braces {} are used to group objects together;
in this case they indicate unambiguously what goes over what
on the left-hand side of the expression.
The language defines the precedence of
.ul
sup
to be higher than that of
.ul
over,
so
no braces are needed to get the correct association on the right side.
Braces can always be used when in doubt
about precedence.
.PP
The braces convention is an example of the power
of using a recursive grammar
to define the language.
It is part of the language that if a construct can appear
in some context,
then
.ul
any expression
in braces
can also occur in that context.
.PP
There is a
.ul
sqrt
operator for making square roots of the appropriate size:
``sqrt a+b'' produces $sqrt a+b$,
and
.P1
x =  {-b +- sqrt{b sup 2 -4ac}} over 2a
.P2
is
.EQ
x={-b +- sqrt{b sup 2 -4ac}} over 2a
.EN
Since large radicals look poor on our typesetter,
.ul
sqrt
is not useful for tall expressions.
.PP
Limits on summations, integrals and similar
constructions are specified with
the keywords
.ul
from
and
.ul
to.
To get
.EQ
sum from i=0 to inf x sub i -> 0
.EN
we need only type
.P1
sum from i=0 to inf x sub i -> 0
.P2
Centering and making the $SIGMA$ big enough and the limits smaller
are all automatic.
The
.ul
from
and
.ul
to
parts are both optional,
and the central part (e.g., the $SIGMA$)
can in fact be anything:
.P1
lim from {x -> pi /2} ( tan~x) = inf
.P2
is
.EQ
lim from {x -> pi /2} ( tan~x) = inf
.EN
Again,
the braces indicate just what goes into the
.ul
from
part.
.PP
There is a facility for making braces, brackets, parentheses, and vertical bars
of the right height, using the keywords
.ul
left
and
.ul
right:
.P1
left [ x+y over 2a right ]~=~1
.P2
makes
.EQ
left [ x+y over 2a right ]~=~1
.EN
A
.ul
left
need not have a corresponding
.ul
right,
as we shall see in the next example.
Any characters may follow
.ul
left
and
.ul
right,
but generally only various parentheses and bars are meaningful.
.PP
Big brackets, etc.,
are often used with another facility,
called
.ul
piles,
which make vertical piles of objects.
For example,
to get
.EQ
sign (x) ~==~ left {
   rpile {1 above 0 above -1}
   ~~lpile {if above if above if}
   ~~lpile {x>0 above x=0 above x<0}
.EN
we can type
.P1
sign (x) ~==~ left {
   rpile {1 above 0 above -1}
   ~~lpile {if above if above if}
   ~~lpile {x>0 above x=0 above x<0}
.P2
The construction ``left {''
makes a left brace big enough
to enclose the
``rpile {...}'',
which is a right-justified pile of
``above ... above ...''.
``lpile'' makes a left-justified pile.
There are also centered piles.
Because of the recursive language definition,
a
pile
can contain any number of elements;
any element of a pile can of course
contain piles.
.PP
Although
.UC EQN
makes a valiant attempt
to use the right sizes and fonts,
there are times when the default assumptions
are simply not what is wanted.
For instance the italic
.ul
sign
in the previous example would conventionally
be in roman.
Slides and transparencies often require larger characters than normal text.
Thus we also provide size and font
changing commands:
``size 12 bold {A~x~=~y}''
will produce
$size 12 bold{ A~x~=~y}$.
.ul
Size
is followed by a number representing a character size in points.
(One point is 1/72 inch;
this paper is set in 9 point type.)
.PP
If necessary, an input string can be quoted in "...",
which turns off grammatical significance, and any font or spacing changes that might otherwise be done on it.
Thus we can say
.P1
lim~ roman "sup" ~x sub n = 0
.P2
to ensure that the supremum doesn't become a superscript:
.EQ
lim~ roman "sup" ~x sub n = 0
.EN
.PP
Diacritical marks, long a problem in traditional typesetting,
are straightforward:
.EQ
x dot under + x hat + y tilde + X hat + Y dotdot = z+Z bar
.EN
is made by typing
.P1
x dot under + x hat + y tilde
+ X hat + Y dotdot = z+Z bar
.P2
.PP
There are also facilities for globally changing default
sizes and fonts, for example for making viewgraphs
or for setting chemical equations.
The language allows for matrices, and for lining up equations
at the same horizontal position.
.PP
Finally, there is a definition facility,
so a user can say
.P1
define name "..."
.P2
at any time in the document;
henceforth, any occurrence of the token ``name''
in an expression
will be expanded into whatever was inside
the double quotes in its definition.
This lets users tailor
the language to their own specifications,
for it is quite possible to redefine
keywords
like
.ul
sup
or
.ul
over.
Section 6 shows an example of definitions.
.PP
The
.UC EQN
preprocessor reads intermixed text and equations,
and passes its output to
.UC TROFF.
Since
.UC TROFF
uses lines beginning with a period as control words
(e.g., ``.ce'' means ``center the next output line''),
.UC EQN
uses the sequence ``.EQ'' to mark the beginning of an equation and
``.EN'' to mark the end.
The ``.EQ'' and ``.EN'' are passed through to
.UC TROFF
untouched,
so they can also be used by a knowledgeable user to
center equations, number them automatically, etc.
By default, however,
``.EQ'' and ``.EN'' are simply ignored by
.UC TROFF ,
so by default equations are printed in-line.
.PP
``.EQ'' and ``.EN'' can be supplemented by
.UC TROFF
commands as desired;
for example, a centered display equation
can be produced with the input:
.P1
.ce 0
.in 5
 .ce
 .EQ
 x sub i = y sub i ...
 .EN
.in 0
.P2
.PP
Since it is tedious to type
``.EQ'' and ``.EN'' around very short expressions
(single letters, for instance),
the user can also define two characters to serve
as the left and right delimiters of expressions.
These characters are recognized anywhere in subsequent text.
For example if the left and right delimiters have both been set to ``#'',
the input:
.P1
Let #x sub i#, #y# and #alpha# be positive
.P2
produces:
.P1
Let $x sub i$, $y$ and $alpha$ be positive
.P2
.PP
Running a preprocessor is strikingly easy on
.UC UNIX.
To typeset
text stored in file
``f\|'',
one issues the command:
.P1
eqn f | troff
.P2
The vertical bar connects the output
of one process
.UC (EQN)
to the input of another
.UC (TROFF) .
.NH
Language Theory
.PP
The basic structure of the language is
not a particularly original one.
Equations are pictured as a set of ``boxes,''
pieced together in various ways.
For example, something with a subscript is
just a box followed by another box moved downward
and shrunk
by an appropriate amount.
A fraction is just a box centered above another box,
at the right altitude,
with a line of correct length drawn between them.
.PP
The grammar for the language is shown below.
For purposes of exposition, we have collapsed
some productions. In the original grammar, there
are about 70 productions, but many of these
are simple ones used only to guarantee
that some keyword is recognized early enough in the parsing process.
Symbols in
capital letters
are terminal symbols;
lower case
symbols are non-terminals, i.e., syntactic categories.
The vertical bar \(bv indicates an alternative;
the brackets [ ] indicate optional material.
A
.UC TEXT
is a string of non-blank characters or
any string inside double quotes;
the other terminal symbols represent literal occurrences
of the corresponding keyword.
.P1
.ce 0
.ta .3i
.ps 9
.ne 17
.in 1
eqn	: box | eqn box
.sp 5p
box	: text
	| { eqn }
	| box OVER box
	| SQRT box
	| box SUB box | box SUP box
	| [ L | C | R ]PILE { list }
	| LEFT text eqn [ RIGHT text ]
	| box [ FROM box ] [ TO box ]
	| SIZE text box
	| [ROMAN | BOLD | ITALIC] box
	| box [HAT | BAR | DOT | DOTDOT | TILDE]
	| DEFINE text text
.sp 5p
list	: eqn | list ABOVE eqn
.sp 5p
text	: TEXT
.ps 10
.in 0
.P2
.PP
The grammar makes it obvious why there are few exceptions.
For example, the observation that something can be replaced by a more complicated something
in braces is implicit in the productions:
.P1
.ce 0
   eqn	: box | eqn box
   box	: text | { eqn }
.P2
Anywhere a single character could be used,
.ul
any
legal construction can be used.
.PP
Clearly, our grammar is highly ambiguous.
What, for instance, do we do with the input
.P1
a over b over c  ?
.P2
Is it
.P1
{a over b} over c
.P2
or is it
.P1
a over {b over c}  ?
.P2
.PP
To answer questions like this, the grammar
is supplemented with a small set of rules that describe the precedence
and associativity
of operators.
In particular, we specify (more or less arbitrarily)
that
.ul
over
associates to the left,
so the first alternative above is the one chosen.
On the other hand,
.ul
sub
and
.ul
sup
bind to the right,
because this is closer to standard mathematical practice.
That is, we assume $x sup a sup b$ is $x sup {(a sup b )}$,
not  $(x sup a ) sup b$.
.PP
The precedence rules resolve the ambiguity in a construction like
.P1
a sup 2 over b
.P2
We define
.ul
sup
to have a higher precedence than
.ul
over,
so this construction is parsed as
$a sup 2 over b$ instead of $a sup {2 over b}$.
.PP
Naturally, a user can always
force a particular parsing
by placing braces around expressions.
.PP
The ambiguous grammar approach seems to be quite useful.
The grammar we use is small enough to be easily understood,
for it contains none of the productions that would be
normally used for resolving ambiguity.
Instead the supplemental information about
precedence and associativity (also small enough to be understood)
provides the compiler-compiler
with the information it needs
to make a fast, deterministic parser for
the specific language we want.
When the language is supplemented by the disambiguating rules,
it is in fact
.UC LR(1)
and thus easy to parse[5].
.PP
The output code is generated as the input is scanned.
Any time a production
of the grammar is recognized,
(potentially) some
.UC TROFF
commands are output.
For example, when the lexical analyzer
reports that it has found a
.UC TEXT
(i.e., a string of contiguous characters),
we have recognized the production:
.P1
text    : TEXT
.P2
The translation of this is simple.
We generate a local name for the string,
then hand the name and the string to
.UC TROFF,
and let
.UC TROFF
perform the storage management.
All we save is the name of the string,
its height, and its baseline.
.PP
As another example,
the translation associated with the production
.P1
box    : box OVER box
.P2
is:
.P1
.ce 0
.in 1
.ne 14
Width of output box =
  slightly more than largest input width
Height of output box =
  slightly more than sum of input heights
Base of output box =
  slightly more than height of bottom input box
String describing output box =
  move down;
  move right enough to center bottom box;
  draw bottom box (i.e., copy string for bottom box);
  move up; move left enough to center top box;
  draw top box (i.e., copy string for top box);
  move down and left; draw line full width;
  return to proper base line.
.in 0
.P2
Most of the other productions have
equally simple semantic actions.
Picturing the output as a set of properly placed boxes
makes the right sequence of positioning commands
quite obvious.
The main difficulty is in finding the right numbers to use
for esthetically pleasing positioning.
.PP
With a grammar, it is usually clear how to extend the language.
For instance, one of our users
suggested a
.UC TENSOR
operator, to make constructions like
.EQ
~ sub size 7 m sup size 7 l
{bold T from n to k} sub size 7 i sup size 7 j
.EN
Grammatically, this is easy:
it is sufficient to add a production like
.P1
box    : TENSOR { list }
.P2
Semantically, we need only juggle the boxes to the right places.
.NH
Experience
.PP
There are really three aspects of interest_how
well
.UC EQN
sets mathematics,
how well it satisfies its goal
of being ``easy to use,''
and how easy it was to build.
.PP
The first question is easily addressed.
This entire paper
has been set by the program.
Readers can judge for themselves
whether it is good enough for their purposes.
One of our users commented that although the output
is not as good as the best hand-set material,
it is still
better than average,
and much better than
the worst.
In any case, who cares?
Printed books cannot compete with the birds and flowers
of illuminated manuscripts on esthetic grounds,
either,
but they have some clear economic advantages.
.PP
Some of the deficiencies in the output could
be cleaned up with more work on our part.
For example, we sometimes leave too much space between
a roman letter and an italic one.
If we were willing to keep track of the fonts
involved,
we could do this better more of the time.
.PP
Some other weaknesses are inherent in our output device.
It is hard, for instance, to draw a line
of an arbitrary length without getting
a perceptible overstrike at one end.
.PP
As to ease of use,
at the time of writing,
the system has been used by two distinct groups.
One user population consists of mathematicians,
chemists, physicists, and computer scientists.
Their typical reaction has been something like:
.IP " (1)"
It's easy to write, although I make the following mistakes...
.IP " (2)"
How do I do...?
.IP " (3)"
It botches the following things.... Why don't you fix them?
.IP " (4)"
You really need the following features...
.sp 5p
.PP
The learning time is short.
A few minutes gives the general flavor,
and typing a page or two of a paper generally
uncovers most of the misconceptions about how it works.
.PP
The second user group is much larger,
the secretaries and mathematical typists
who were the original target of the system.
They tend to be enthusiastic converts.
They find the language easy to learn
(most are largely self-taught),
and have little trouble producing the output they want.
They are of course less critical of the esthetics of their output
than users trained in mathematics.
After a transition period, most find
using a computer more interesting than
a regular typewriter.
.PP
The main difficulty that users have seems to be remembering
that a blank is a delimiter;
even experienced users use blanks where they shouldn't and omit them
when they are needed.
A common instance is typing
.P1
f(x sub i)
.P2
which produces
.EQ
f(x sub i)
.EN
instead of
.EQ
f(x sub i )
.EN
Since the
.UC EQN
language knows no mathematics, it cannot deduce that the
right parenthesis is not part of the subscript.
.PP
The language is somewhat prolix, but this doesn't seem
excessive considering how much is being done,
and it is certainly more compact than the corresponding
.UC TROFF
commands.
For example, here is the source for the continued fraction
expression in Section 1 of this paper:
.P1
.ne 4
.ce 0
     a sub 0 + b sub 1 over
       {a sub 1 + b sub 2 over
         {a sub 2 + b sub 3 over
           {a sub 3 + ... }}}
.P2
This is the input for the large integral of Section 1;
notice the use of definitions:
.P1
.ce 0
.ne 15
.in 1
define emx "{e sup mx}"
define mab "{m sqrt ab}"
define sa "{sqrt a}"
define sb "{sqrt b}"
int dx over {a emx - be sup -mx} ~=~
left { lpile {
     1 over {2 mab} ~log~
           {sa emx - sb} over {sa emx + sb}
   above
     1 over mab ~ tanh sup -1 ( sa over sb emx )
   above
     -1 over mab ~ coth sup -1 ( sa over sb emx )
}
.in 0
.P2
.PP
As to ease of construction,
we have already
mentioned that there are really only a few person-months
invested.
Much of this time has gone into two things_fine-tuning
(what is the most esthetically pleasing space to use
between the numerator and denominator of a fraction?),
and changing things found deficient by our users
(shouldn't a tilde be a delimiter?).
.PP
The program consists of a number of small,
essentially unconnected modules for code generation,
a simple lexical analyzer,
a canned parser which we did not have to write,
and some miscellany associated with input files
and the macro facility.
The program is now about 1600 lines of
.UC C
[6], a high-level language reminiscent of
.UC BCPL .
About 20 percent of these lines are ``print'' statements,
generating the output code.
.PP
The semantic routines that generate the actual
.UC TROFF
commands can be changed to accommodate other formatting languages
and devices.
For example, in less than 24 hours,
one of us changed the entire semantic package
to drive
.UC NROFF,
a variant of
.UC TROFF,
for typesetting mathematics on teletypewriter devices
capable of reverse line motions.
Since many potential users do not have access
to a typesetter, but still have to type mathematics,
this provides a way to get a typed version of the final output
which is close enough for debugging purposes,
and sometimes even for ultimate use.
.NH
Conclusions
.PP
We think we have shown that it is possible
to do acceptably good typesetting of mathematics
on a phototypesetter,
with an input language that is easy to learn and use and
that satisfies many users' demands.
Such a package can be implemented in
short order,
given a compiler-compiler and
a decent typesetting program underneath.
.PP
Defining a language, and building a compiler for it
with a compiler-compiler
seems like the only sensible way to do business.
Our experience with the use of
a grammar and a compiler-compiler has been
uniformly favorable.
If we had written everything into code directly,
we would have been locked into
our original design.
Furthermore, we would have never been sure
where the exceptions and special cases were.
But because we have a grammar, we can change our minds readily and still be reasonably
sure that if a construction works in one place
it will work everywhere.
.SH
Acknowledgements
.PP
We are deeply indebted to
J. F. Ossanna,
the author of
.UC TROFF ,
for his willingness to modify
.UC TROFF
to make our task easier
and for his continuous assistance
during the development of our program.
We are also grateful to
A. V. Aho for help with language theory,
to S. C. Johnson for aid with the compiler-compiler,
and to our early users
A. V. Aho, S. I. Feldman, S. C. Johnson,
R. W. Hamming,
and M. D. McIlroy
for their constructive criticisms.
.SH
References
.IP [1]
.ul
A Manual of Style,
12th Edition.
University of Chicago Press, 1969, p.\ 295.
.IP [2]
.ul
Model C/A/T Phototypesetter.
Graphic Systems, Inc.,
Hudson, NH.
.IP [3]
Ritchie, D. M., and Thompson, K. L.,
``The UNIX time-sharing system.''
\fIComm. ACM 17,\^\fR 7 (July 1974), pp.\ 365-75.
.IP [4]
Ossanna, J. F.,
NROFF/TROFF User's Manual.
Bell Laboratories, 1977.
.IP [5]
Aho, A. V., and Johnson, S. C.,
``LR Parsing.''
\fIComp. Surv. 6,\^\fR 2 (June 1974), pp.\ 99-124.
.br
.IP [6]
B. W. Kernighan and D. M. Ritchie,
.ul
The C Programming Language.
Prentice-Hall, Inc., 1978.
.in 0
.sp
.I "May 1979"