V10/vol2/index/index.ms

.so ../ADM/mac
.XX index 609 "Index"
.TL
Index
.AU
L. L. Cherry
.AI
.MH
.AB
.AE
.2C
.NH
Introduction
.PP
Because the index that follows was generated mechanically,
a description of its layout and contents will help
you use it more effectively.
I'll describe how the index is organized,
how the terms were generated, how the index was generated from
the terms, and finally, the difficulties.
.NH
The Organization
.PP
Each of the papers is assigned a nickname that
is printed in italic to the left of the title in the
Table of Contents and in the header of the paper.
So, for example, the paper 
"A System for
Algorithm Animation" is called \fIanim\fR and
the paper ``A Troff Tutorial'' is called \fItrofftut\fR.
In the index the paper nicknames always appear in italics.
Literals, which may be file names, command names or keywords
in a language, appear in constant width, i.e. \f(CW/bin\fR,
\f(CWawk\fR, \f(CWcircle\fR.
The terms in the index are not fully cross-referenced.
Each paper appears alphabetically under its nickname, followed by an
indented mini-index of terms specific to that paper.
Generally, these terms are not cross-referenced in the top or global level.
So, for example, the terms \f(CWagain\fR, \f(CWbackward\fR,
and ``current view'' in \fIanim\fR only appear indented under \fIanim\fR.
Terms you might look up to determine whether a topic is
discussed anywhere in this volume, however, are fully cross-referenced,
like ``Movie Program'' or ``binary search tree.''
File names and program names are also indexed globally.
Terms that would be redundant or useless under the specific
paper only appear at the global
level.
So, ``Algorithm Animation'' is not in the mini-index under \fIanim\fR.
.NH
Getting the terms
.PP
The terms for the index were generated mechanically
from the text of the papers.
For each paper, a file containing the title and headings, repeated noun phrases,
and distinguished words was generated.
By distinguished words, I mean those that appear in the paper in
a distinguishing font like italics, bold or constant width.
This file was than manually edited to a reasonable list of
indexing terms.
Finally, the terms were coded with indexing level (only global, only specific or
fully cross-referenced) and font information.
.NH
Index Generation
.PP
Production of the index was completely automatic once the terms were generated
and edited.
A form of the \fIdiction\fP program that does pattern matching on English
text, mapping upper case letters to lower case and punctuation to blanks,
located the terms in the text of the paper.
Using this location information, \f(CW.Tm\fR commands
were inserted in the paper to cause \fItroff\fR to print the page number
for each term.
An \fIawk\fR program combined the page numbers for each term and
generated the global and local versions of the indexing term for each paper.
For example, in \fIbackup\fR, references to \f(CW/n\fR, the network
file system, appear in the file generated
by \fItroff\fR as,
.P1
596	/n	c
597	/n	c
602	/n	c
.P2
and become two indexing entries
.P1
/n, backup, 596-597, 602	c
backup, /n, 596-597, 602	c
.P2
The \f(CWc\fR is a code for constant width font.
.PP
Finally the indexes for all of the papers are sorted into one index
and another \fIawk\fR program merges like terms and adds font changes.
So for the term \f(CW/n\fR, which is referenced in \fIbackup\fR, \fInetb\fR, and \fIsetup\fR,
we get
.DS
\f(CW/n\fP, \fIbackup\fR, 596-597, 602
    \fInetb\fR, 519
    \fIsetup\fR, 499-500
.DE
at the global level and \f(CW/n\fR also appears in the mini-indexes
for \fIbackup\fR, \fInetb\fR, and \fIsetup\fR.
.NH
Difficulties
.PP
Because the terms are generated directly from the text
and the text was written by 40 authors,
language choices differ among the papers, and hence
among the terms in the index.
To locate a paper of interest, you may have to look
in the vicinity of the term.
For example, although there are 14 papers that deal with
``typesetting'', not all of them explicitly include the term.
So we have indexing terms ``typeset,'' ``typesetter'' and
``typesetter graphics.''
.PP
Terms as they appear in the papers may not be in the same
font or case as they appear in the index.
There are several reasons for this.
First the matching is done solely in lower case
in order to match terms that begin sentences
or appear in headings.
Terms that were derived from headings
have their upper case letters restored but may appear in the text
of the paper in lower case as well.
For some indexing terms case is significant, like the \f(CWd\fR and \f(CWD\fR
commands in \fIsam\fR, but unfortunately lost.
Both commands are mapped to lower case.
Because font changes in the index carry information
(paper nickname, literal) the font in the index may differ
from that used in the paper.
Although most of language papers use constant width font for
literals, UNIX commands appear in italic.
In the index UNIX commands are in constant width.
.PP
Where possible, indexing terms are matched in programming
examples as well as in the text of the papers.
For some terms, like the \f(CWif\fR statement in \fIspin\fR,
this is not possible without matching every occurrence
of the word ``if'' in the text.
.PP
Finally, the page number of the term in the index may be off by one,
referring to the page before the actual occurrence of the term.
When the terms are inserted in the text of the papers
for \fItroff\fR to compute their page numbers, they are
added in front of the sentence in which they occur.
If this sentence begins on the preceding page, then
the term will be attributed to that page rather than
the one on which it physically appears.
.PP
If you remember to look in the vicinity of
the term you're looking for in the index and not
expect the font or case to be identical in the paper,
you should find the information you want in this volume.
.NH
Acknowledgements
.PP
I am indebted to Doug McIlroy, N. Peter Nelson
and Lloyd Nakatani for looking at early
versions of the index and for their helpful suggestions
on its form.
....
.... augmented term in index - 5620 ->5620 terminal
..... sometimes lc used to enhence folding of terms