4.3BSD/usr/contrib/icon/man/man1/rsg.1

.so tmac.ilib
.TH RSG 1 "The University of Arizona \- 5/16/83"
.SH NAME
rsg \- generate random sentences
.SH SYNOPSIS
\f3rsg\fP [\f3\-l\fI n\fR] [\f3\-l \fIn\fR] [\f3\-t\fR]
.SH DESCRIPTION
\fIRsg\fR generates randomly selected sentences from a grammar specified by
the user.
.PP
The following options may appear in any order:
.IP "\f3\-s\fI n\fR"
Set the seed for random generation to \fIn\fR.
The default seed is 0.
.IP "\f3\-l\fI n\fR"
Terminate generation if the number of symbols remaining to be processed
exceeds \fIn\fR. There is no default limit.
.IP \f3\-t\fR
Trace the generation of sentences. Trace output goes to standard error
output.
.PP
\fIRsg\fR works interactively, allowing the user to build, test, modify,
and save grammars. Input to \fIrsg\fR consists of various kinds of
specifications, which can be intermixed:
.PP
\fIProductions\fR define nonterminal symbols in a syntax similar to 
the rewriting rules of BNF with various alternatives consisting
of the concatenation of nonterminal and terminal symbols.
.PP
\fIGeneration specifications\fR cause the generation of a specified
number of sentences from the language defined by a given nonterminal
symbol.
.PP
\fIGrammar output specifications\fR cause the definition of a
specified nonterminal or the entire current grammar to be written
to a given file.
.PP
\fISource specifications\fR cause subsequent input to be read from
a specified file.
.PP
In addition, any line beginning with \*M#\fR is considered to be
a comment, while any line beginning with \*M=\fR causes the rest
of that line to be used as a prompt to the user whenever \fIrsg\fR
is ready for input (there normally is no prompt). A line consisting
of a single \*M=\fR stops prompting.
.SH \0\0\0Productions
Examples of productions are:
.DS
<expr>::=<term>|<term>+<expr>
<term>::=<element>|<element>*<term>
<element>::=x|y|z|(<expr>)
.DE
Productions may occur in any order. The definition for a nonterminal
symbol can be changed by specifying a new production for it.
.PP
There are a number of special devices to facilitate the definition of
grammars, including eight predefined, built-in nonterminal symbols:
.nf
.sp 1
.ta .5i 1.5i
	symbol	definition
.sp .5
	\*M<lb>	<
	<rb>	>
	<vb>	|
	<nl>\fR	newline
	\*M<>\fR	empty string
	\*M<&lcase>\fR	any single lowercase letter
	\*M<&ucase>\fR	any single uppercase letter
	\*M<&digit>\fR	any single digit
.sp 1
.fi
In addition, if the string between a \*M<\fR and \*M>\fR
begins and
ends with a single quotation mark, that construction stands for
any single character between the quotation marks. For example,
.DS
<'xyz'>
.DE
is equivalent to
.DS
x|y|z
.DE
Finally, if the name of a nonterminal symbol between the \*M<\fR and
\*M>\fR begins with \*M?\fR, the user is queried during generation
to supply a string for that nonterminal symbol. For example, in
.DS
<expr>::=<term>|<term>+<expr>|<?expr>
.DE
if the third alternative is encountered during generation, the user is
asked to provide a string for \*M<expr>\fR.
.SH \0\0\0Generation Specifications
A generation specification consists of a nonterminal symbol
followed by a nonnegative integer. An example is
.DS
<expr>10
.DE
which specifies the generation of 10 \*M<expr>\fRs. If the
integer is omitted, it is assumed to be 1. Generated sentences
are written to standard output.
.SH \0\0\0Grammar Output Specifications
A grammar output specification consists of a nonterminal symbol,
followed by \*M\->\fR, followed by a file name. Such a specification
causes the current definition of the nonterminal symbol to be
written to the given file. If the file is omitted, standard output
is assumed. If the nonterminal symbol is omitted, the entire grammar
is written out. Thus,
.DS
\->
.DE
causes the entire grammar to be written to standard output.
.SH \0\0\0Source Specifications
A source specification consists of \*M@\fR followed by a file name.
Subsequent input is read from that file. When an end of file is encountered,
input reverts to the previous file. Input files can be nested.
.SH DIAGNOSTICS
Syntactically erroneous input lines are noted, but ignored.
.PP
Specifications for a file that cannot be opened are noted and treated as
erroneous.
.PP
If an undefined nonterminal symbol is encountered during generation,
an error message that identifies the undefined symbol is produced,
followed by the partial sentence generated to that point. Exceeding
the limit of symbols remaining to be generated as specified by
the \f3\-l\fR option is handled in similarly.
.SH CAVEATS
Generation may fail to terminate because of a loop in the rewriting
rules or, more seriously, because of the progressive accumulation
of nonterminal symbols. The latter problem can be identified
by using the \f3\-t\fR option and controlled by using the \f3\-l\fR
option. The problem often can be circumvented by duplicating alternatives
that lead to fewer rather than more nonterminal symbols. For
example, changing
.DS
<expr>::=<term>|<term>+<expr>
.DE
to
.DS
<expr>::=<term>|<term>|<term>+<expr>
.DE
increases the probability of selecting \*M<term>\fR from 1/2 to 2/3.
See the second reference listed below for a discussion of the general
problem.
.SH SEE ALSO
.Ib
pp. 211-219, 301-302.
.PP
Wetherell, C. S. ``Probabilistic Languages: A Review and Some Open
Questions'', \fIComputer Surveys\fR, Vol. 12, No. 4 (1980), pp. 361-379.
.SH AUTHOR
Ralph E. Griswold