V10/vol2/snocone/snocone.ms

.so ../ADM/mac
.XX snocone 297 "The Snocone Programming Language"
.nr dP 2
.nr dV 3p
.EQ
delim $$
.EN
.ds S4 \s-2SNOBOL4\s0
.de Sx
.P1 0
..
.de Sy
.P2
..
.de Tx
.IP "" 5
..
.de Ty
.PP
..
.TL
The Snocone Programming Language
.AU
Andrew Koenig
.AI
.MH
.AB
.PP
\*(S4 is a well-known programming language with convenient
semantics and clumsy syntax.
Following the pattern of Ratfor and \s-2EFL\s0,
we have added ``syntactic sugar'' to
\*(S4, with an eye toward making it easier to use.
We call the result
.I Snocone.
.PP
This paper describes the Snocone language in enough
detail that people not familiar with \*(S4 can
learn to use it, although previous familiarity
with \*(S4 will help.
.AE
.2C
.NH 1
Introduction
.PP
The semantics of the \*(S4 programming language |reference(poage griswold)
include such unusual and useful
features as
dynamic typing,
a general-purpose garbage collector,
excellent character string facilities,
associative arrays,
and strong run-time diagnostics.
Griswold|reference(snobol4 applications),
Gimpel|reference(gimpel snobol4),
and others have documented many ways of
doing things in \*(S4 that can only be accomplished
with much more difficulty in other languages.
.PP
Unfortunately, the control structures of
\*(S4 are archaic, and the fact that
blank is an operator tends to encourage certain types
of errors that are hard to detect.
Other aspects of the language make it a nuisance
to construct large programs out of small parts.
.PP
To ameliorate some of these problems,
we have designed and implemented a new language
that provides some syntactic sugar for \*(S4.
The obvious name for such a language is
.I Snocone .
.PP
The design of the Snocone language was inspired by
Ratfor|reference(kernighan ratfor)
and \s-2EFL\s0|reference(efl cstr).
Like \s-2EFL\s0, and unlike Ratfor, Snocone is a self-contained
programming language, rather than a proper superset of
\*(S4.
Like Ratfor, and unlike \s-2EFL\s0, the Snocone translator
makes no attempt to produce \*(S4 output that is
easy for humans to understand.
.PP
Hanson|reference(ratsno)
has written a similar, but simpler preprocessor
for \*(S4.
His preprocessor is like Ratfor
in that it does not change the statements in
the basic language but rather adds new syntax.
.PP
Griswold|reference(rebus)
has written a preprocessor for a language
similar to Snocone,
the syntax of which is based on Icon|reference(icon language)
rather than on C or \s-2EFL\s0.
His preprocessor is written in C,
and uses Yacc for its parsing.
.PP
In contrast, Snocone has a syntax roughly
based on C and is written in Snocone.
.SH
What's nice about \*(S4
.PP
Variables in \*(S4 are dynamically typed:
the type of any variable is the type of the
value most recently assigned to it.
Thus, declarations are unnecessary, and, in fact,
\*(S4 has no declarations as such
(except that procedures can have local variables).
.PP
All operators and built-in functions check
their argument types;
each argument is converted automatically
to an appropriate type.
A run-time diagnostic message results
if a conversion is impossible.
For instance, if an addend
is a string, \*(S4 will try
to convert it to an integer or real number,
depending on the form of its value.
If the value is inappropriate to convert to
a numeric type, the program will halt.
.PP
Like Lisp, and unlike most other languages,
\*(S4 does not have any explicit mechanism for
returning memory to the system.
Rather, the implementation must detect
when memory is no
longer needed and make that memory available
for re-use.
This makes the language harder to implement,
but easier to use.
One nice by-product of this sort of memory
allocator is that \*(S4 programs tend
to be free of arbitrary size limitations.
.PP
One \*(S4 data type is the
.I pattern ,
which describes an (arbitrarily complex)
set of character strings.
Briefly, the language incorporates a general
top-down backtracking parser.
This parser is so easy to use that it
is the usual way of dealing with strings
in \*(S4.
.PP
Another useful data type is the
.I table .
A table is like a one-dimensional array, except that
its subscripts are not limited to being integers.
For instance, a compiler symbol table might be maintained
as a \*(S4 table, with the subscript being the identifier
and the value being some structure which holds the desired
information about that identifier.
.PP
An interesting and unusual semantic idea in \*(S4 is
that of
.I "statement failure" .
Many operations can easily encounter conditions that
preclude their execution.
For example, an attempt may be made to access a non-existent
array element, read beyond the last record of a file,
search a string for a pattern that it does not contain,
or even make a comparison that gives an unexpected result.
When this happens, the operation
.I fails .
Usually, the failure of an operation implies the failure
of the statement of which it is a part.
While statement failure is not an error condition,
it can be tested.
In fact, conditional transfer on the success or
failure of a statement is virtually the only
kind of control structure in \*(S4.
.PP
A good implementation of \*(S4 is
Macrospitbol, by Dewar and McCann|reference(spitbol).
This is a portable threaded-code interpreter, which
is fast and robust enough to be used for ``production''
purposes.
For example, on a VAX-11/780 it translates about 150
statements per second and executes about 5,000.
.PP
\*(S4 implementations in general have excellent
run-time diagnostic facilities.
The language definition includes ways of tracing
programs for debugging, and of trapping almost any
kind of run-time error.
.SH
What's not nice about \*(S4
.PP
Here is a simple \*(S4 program to add the
integers between 1 and 1000:
.P1 0
	sum = 0
	term = 1
loop	sum = sum + term
	term = term + 1
	LE(term,1000)		:s(loop)
	OUTPUT = "The sum is " sum
END
.P2
.PP
The first two lines of this program assign values to
variables.
The name
.CW loop
in the third line is a label; it is recognized as such
because it begins the line without any white space
ahead of it.
.PP
The fifth line calls a built-in function to compare the
value of the variable
.CW term
to 1000.
\*(S4 has no comparison operators: all comparisons are
done by
.I predicate
functions that either return the null string if the condition
being tested is true or fail if the condition is false.
The \f(CW:s(loop)\fP at the end of the line
is a conditional
.B "go to" :
it causes control to transfer to the label
.CW loop
if the statement succeeds, and to the following
statement if it fails.
.PP
In contrast, the program looks like this in Snocone:
.P1 0
sum = 0
term = 1
do {
	sum = sum + term
	term = term + 1
} while (term <= 1000)
OUTPUT = "The sum is " && sum
.P2
.PP
In the past fifteen years, the trend in programming
language design has been overwhelmingly toward
programming languages with control structures that
make it reasonable to write programs with at most
a very few
.B "go to"
statements.
\*(S4 exhibits almost the exact opposite of this trend:
essentially the only control structure in \*(S4 is
a form of conditional
.B "go to"
statement.
There is no block structure, and all labels are global.
Thus, writing a \*(S4 program requires a constant effort
to invent new label names, and to choose names that have
at least some chance of telling the reader whether their
use is local or global.
.PP
The programmer's job is not made any easier by the peculiar way
\*(S4 handles subroutines.
\*(S4 subroutines are defined at run time by a built-in
function called \f(CWDEFINE\fP.
Its argument is a
.I prototype
of the subroutine to be defined \-
a character string that describes how the
subroutine is to be used.
The subroutine's entry point is the label with the same name.
Because the statement bearing this label is not special
in any other way, it must be placed where it will not
be executed in the ordinary course of events.
The call to \f(CWDEFINE\fP, on the other hand, must
be executed before the subroutine it defines can be used.
.PP
This leads \*(S4 programmers into contortions.
To keep the \f(CWDEFINE\fP call near the subroutine body,
one must invent yet another label:
.P1 0
        DEFINE("square(x)")	:(square.end)
square  square = x * x		:(RETURN)
square.end
.P2
Alternatively, one can put all the \f(CWDEFINE\fP calls
in one place, at the cost of moving the body of each
subroutine arbitrarily far from where its prototype appears.
.PP
The Snocone programmer has an easier job:
.P1 0
procedure square (x) {
	return x * x
}
.P2
.SH
Motivation
.PP
Snocone's purpose, then, is to make it easy for a
programmer used to a block-structured language like C
to write programs that have the freedom and semantic flexibility
of \*(S4.
To this end, we have changed the \*(S4 language in several ways.
.PP
First, we introduced an explicit concatenation operator.
\*(S4 uses blank for concatenation, so
.P1 0
y = f(x)
.P2
is a function call, but
.P1 0
y = f (x)
.P2
assigns to \f(CWy\fP the concatenation of the 
values of the variables
\f(CWf\fP and \f(CWx\fP.
We chose \f(CW&&\fP to represent concatenation because
\*(S4 programs often use concatenation for the same purpose
that C programs use \f(CW&&\fP.
.PP
Second, we allow much the same freedom with spaces that
C does, with one exception:  newline ends a statement.
Statements may be continued onto multiple lines in much the
same way as in \s-2EFL\s0 programs: a line is continued
if it ends with an
operator or some kind of open bracket.
.PP
Third, we have added procedure and structure declarations.
These things are accomplished in \*(S4 by calling built-in
subroutines, and the context of the calls is often obscure.
.PP
Finally, we have introduced control structures similar to
those in C and other block-structured languages: the
\f(CWif\fP, \f(CWwhile\fP, \f(CWfor\fP, and \f(CWdo\fP statements.
.NH 1
Language Description
.NH 2
Lexical Conventions
.PP
Blanks (spaces and tabs, but not newlines) may not appear within a
token, but may freely separate tokens.
Comments begin with a \f(CW#\fP
character and end at the end of the line.
.PP
Constants may be integers, reals, or strings.
There are no signed
numeric constants.
Integers range from 0 to $2 sup 31 - 1$.
Real constants
are short-precision only, and must contain either a decimal point or an
\f(CWE\fP (or \f(CWe\fP).
String constants may be delimited by single or double quotes
with the same meaning.
Characters in quotes receive no special
interpretation.
A single quote may appear in strings delimited by
double quotes, and vice versa.
.EQ
delim off
.EN
.PP
Identifiers may be of any length.
All characters in an identifier are
significant, but their case is presently not significant.
Case may
become significant in the future.
An identifier is a non-empty
sequence of letters and digits, beginning with a letter.
Underscore
(_) is a letter.
The same identifier may be used independently to
represent a procedure, label, or variable.
.NH 2
Statement Separation
.PP
Snocone delimits statements much like the Shell|reference(bstj shell)
(the command interpreter in the
.UX
operating system)|reference(unix cacm ritchie thompson)
or \s-2EFL\s0.
A statement is
ended either by a semicolon or newline, except that if the last token
on a line is an operator or open parenthesis or bracket, the next line
is automatically considered as part of the current statement.
Newlines
may also separate clauses of compound statements, so
.P1 0
if (a < 0)
	a = 0
.P2
is acceptable and means the same as
.P1 0
if (a < 0) a = 0
.P2
or, spreading things as much as possible:
.P1 0
if (
a <
0)
a =
0
.P2
.NH 2
File Inclusion
.PP
The contents of an arbitrary file can be incorporated into
the program by writing an
.I "include line"
in one of the following four forms:
.P1 0
#include "\fIfile\fP"
#include '\fIfile\fP'
#include <\fIfile\fP>
#include {\fIfile\fP}
.P2
Spaces may appear between \f(CW#\fP and \f(CWinclude\fP.
The contents of the named file are substituted for
the \f(CWinclude\fP line.
.PP
The exact behavior of an \f(CWinclude\fP line depends
on the delimiters surrounding the file name.
If quotes are used (single or double), the file is
sought in the current directory.
If brackets are used (angle brackets or curly braces),
the file is sought in an installation-dependent
system library
(\f(CW/usr/lib/snocone\fP on our systems).
If double quotes or angle brackets are used, the
file is included unconditionally, even if the same
file is included several times.
If single quotes or curly braces are used,
the file will only be included once, even if it
is named in several \f(CWinclude\fP lines.
.PP
This latter behavior is useful in the following sort of situation.
Suppose that procedure
.I proc1
is defined in
file
.I proc1.h ,
and procedure
.I proc2
is defined
in file
.I proc2.h .
A program that calls both
.I proc1
and
.I proc2
might then contain:
.P1 0
#include "proc1.h"
#include "proc2.h"
.P2
Now, suppose that
.I proc1
is changed to call
.I proc2 .
If
.I proc1.h
is made to contain
.P1 0
#include "proc2.h"
.P2
then our hypothetical sample program will wind
up with two copies of
.I proc2.h
and will therefore run into trouble with duplicate
procedure definitions.
If, however,
.I proc1.h
contains:
.P1 0
#include 'proc2.h'
.P2
and the main program contains:
.P1 0
#include 'proc1.h'
#include 'proc2.h'
.P2
then
.I proc2
will be defined only once.
.NH 2
Expression Evaluation, Success, and Failure
.PP
Evaluating an expression has one of three
outcomes: a value, failure, or error.
.PP
Errors normally result in a diagnostic message
and termination of the program, and arise
from the usual sorts of things:
overflow, underflow,
division by zero, impossible conversions,
running out of memory, and so on.
.PP
Failure, on the other hand, is not an error
condition.
An expression that fails is merely one that
does not yield a value.
Examples of expressions that fail include
attempts to refer to non-existent array
elements,
attempts to read beyond end of file,
and even comparisons that yield unexpected results.
For instance, the expression
.P1 0
a < b
.P2
yields a null string if \f(CWa\fP is indeed less than \f(CWb\fP,
and fails otherwise.
.PP
Most operators fail if any of their operands fails.
The few exceptions will be noted below.
.PP
An expression that always either yields a null string or fails
is sometimes called a
.I predicate .
Testing such expressions for success or failure in \f(CWif\fP,
\f(CWwhile\fP, and \f(CWdo\fP statements is the primary way
of affecting the flow of control
in Snocone.
.NH 2
Data Types, Declarations, and Scope
.PP
Variables in Snocone are dynamically typed: a variable has the type of
the value most recently assigned to it.
Except for a few predefined variables
(described under
.I "pattern matching" ),
all variables have the null string as their initial values.
All variables are global, but procedures
can nominate variables whose
values will automatically be saved at
entry and restored at exit.
The only declarations define
structures:
.P1 0
struct cons {car, cdr}
.P2
In effect, this declaration defines three procedures named
\f(CWcons\fP, 
\f(CWcar\fP,
and \f(CWcdr\fP.
The value of \f(CWcons(a,b)\fP is a newly-created
\f(CWcons\fP object with \f(CWa\fP and \f(CWb\fP
as the values of its \f(CWcar\fP and \f(CWcdr\fP fields,
respectively.
The fields, in turn, are accessed by similarly-named
functions.
For instance, the \f(CWcdr\fP field of \f(CWcons\fP structure \f(CWx\fP
is accessed as \f(CWcdr(x)\fP.
Thus this program:
.P1 0
struct cons {car, cdr}
a = cons (3, cons (4, 5))
OUTPUT = car (cdr (a))
.P2
prints \f(CW4\fP.
The procedures corresponding to field names may
be used as the target of an assignment:
.P1 0
car (a) = "Hello"
.P2
Snocone offers other data types than those described above.
While
numeric constants cannot be negative, variables and expressions
suffer from no such restriction.
Strings may be of any length up
to an implementation-defined upper bound, usually many thousands
of characters.
All variables have the null string as their
initial value.
.PP
Aggregate values come in two types: arrays and tables.
Each type
is created by a built-in procedure with the same name.
Thus:
.P1 0
a = ARRAY (20)
.P2
creates a 20-element array, initializes each element to the null
string, and assigns it to a.
The second argument to the array
procedure is an initializing value:
.P1 0
a = ARRAY (20, -1)
.P2
makes a an array of 20 elements, each with value \-1.
To get
multi-dimensional arrays, express the dimensions as a string:
.P1 0
a = ARRAY ('4,5', -1)
.P2
One can also give explicit lower bounds:
.P1 0
a = ARRAY ('-10:10')
.P2
The default lower bound is 1.
.PP
Array elements are referenced by Algol-like subscripts:
.P1 0
a[i,j] = a[i,j] + b[i,k] * c[k,j]
.P2
Each array element behaves as a variable, and can therefore have
a value of any data type, independently of any other element.
.PP
A table is like a one-dimensional array whose subscripts are not
restricted to integers:
.P1 0
t = TABLE (20)
.P2
The argument to the \f(CWTABLE\fP procedure is an estimate of the maximum
number of elements that will actually be stored in the table.
If substantially more elements than
this are stored, access will
begin to slow down.
On the other hand, giving too large an
initial value wastes space.
Once a table has been created, any
value can be used for a subscript.
\f(CWt[a]\fP and \f(CWt[b]\fP will refer to
the same element if and only if a and b are identical.
The
precise meaning of ``identical'' is given with the description of
the \f(CW::\fP and \f(CW:!:\fP operators;
suffice it to say that \f(CW3\fP and \f(CW"3"\fP are
not identical, and that after executing
.P1 0
a = b
.P2
\f(CWa\fP and \f(CWb\fP are
identical regardless of what values they had before.
.NH 2
Binary Operators
.PP
The following operators are grouped in order of decreasing priority.
Unless otherwise stated, they are left-associative.
.IP "\f(CW. $\fP"
Pattern value assignment (see Patterns)
.IP "\f(CW^\fP"
Exponentiation (right-associative).
The right argument
must be an integer.
If both arguments are integers,
the right argument must be non-negative
.IP "\f(CW* / %\fP"
.br
Multiplication, division, and remainder.
As in C,
and as not in \*(S4, multiplication and
division have the same precedence.
.IP "\f(CW+ -\fP"
Addition and subtraction.
.IP "\f(CW== != < > <= >= :==: :!=:\fP"
.IP "\f(CW:<: :>: :<=: :>=: :: :!:\fP"
.br
Comparison predicates.
Each of these operators returns
a null string if the indicated relation holds,
and fails if not.
The first six do
numeric comparisons: an error results if either
operand cannot be converted to a number.
The next six do string comparisons: an error results
if either operand cannot be converted to a string.
The last two test if the two operands are identical.
Values of different data types are never identical.
Strings and numbers are identical if their data types and values match.
Other values are identical if they refer to the same object.
.IP "\f(CW&&\fP"
Concatenation.
The left operand is evaluated
first; if its evaluation fails, the \f(CW&&\fP operator fails.
Otherwise, the right operand is
evaluated, and \f(CW&&\fP fails if the right operand fails.
If either operand is the null string, the
result is the other operand, even if that operand is not a string.
Otherwise, both operands are
converted to strings and the result is their concatenation.
An error results if either
operand cannot be converted to a string.
Note that if the operands of \f(CW&&\fP are predicates,
\f(CW&&\fP can be used as a kind of logical conjunction.
.IP "\f(CW||\fP"
Logical disjunction.
The left operand is evaluated first; if it
succeeds, its value is the value of \f(CW||\fP.
Otherwise, the value is that of the right operand.
If both operands fail, \f(CW||\fP fails.
.IP "\f(CW|\fP"
Pattern alternation.
See
.I "pattern matching"
for details.
.IP "\f(CW=\fP"
Assignment.
This operator is right-associative.
.IP "\f(CW?\fP"
Pattern match operator.
The left operand is
converted to a string and searched for the first
substring that matches the pattern given by the right operand.
If no such substring is found, the operator fails.
If the left operand is a variable, \f(CW?\fP may be used on the left
side of an assignment.
.NH 2
Unary Operators
.PP
Unary operators bind more tightly than all binary
operators.
.IP "\f(CW+ -\fP"
Unary plus and minus.
The result is always
numeric, so unary plus is sometimes used for type conversion.
Because unary operators bind so tightly, expressions that look like negative
constants behave that way for all practical purposes.
.IP "\f(CW.\fP"
Name operator.
The operand must be a variable;
the result is essentially a pointer to the variable, similarly
to the unary & operator in C.
The result of applying the \f(CWDATATYPE\fP function
to the result of the \f(CW.\fP operator is the string \f(CW"NAME"\fP.
.IP "\f(CW$\fP"
Indirection.
The operand is converted to a name;
the result is the object thus named.
\f(CW$\fP always yields an lvalue.
.IP "\f(CW?\fP"
Query.
If its operand fails, \f(CW?\fP fails.
If its operand succeeds, \f(CW?\fP yields a null string.
Useful for evaluating an expression solely for
its side effects.
.IP "\f(CW~\fP"
Logical negation.
If its operand fails, \f(CW~\fP yields a null string.
If the operand succeeds, \f(CW~\fP fails.
.IP "\f(CW&\fP"
Keyword value.
The operand must be the name of one of a restricted set of variables.
The lvalue result is a system variable, whose value affects the
execution of the program in some way.
System variables are discussed separately.
.IP "\f(CW@\fP"
Pattern cursor assignment.
See
.I "pattern matching"
for details.
.IP "\f(CW*\fP"
Deferred evaluation.
Returns a value of type \f(CWEXPRESSION\fP that
contains all the information necessary to evaluate the operand.
The operand is not actually evaluated at this time, but can be
evaluated later,
either by the \f(CWEVAL\fP procedure or implicitly during pattern matching.
.NH 2
Statements
.PP
Elements in brackets are optional.
If the description of a statement
is split over more than one line, the statement itself may be split
analogously.
Snocone does not have a null statement.
.Sx
\fIexpression\fP
.Sy
.Tx
The expression is evaluated for its side effects.
The result, if any, is discarded.
.Tm if	S
.Ty
.Sx
if (\fIexpression\fP)
	\fIstatement1\fP
[ else
	\fIstatement2\fP ]
.Sy
.Tx
The parenthesized
.I expression
is evaluated.
If it
succeeds,
.I statement1
is executed, otherwise
.I statement2
is executed.
.Ty
.Sx
while (\fIexpression\fP)
	\fIstatement\fP
.Sy
.Tx
Behaves similarly to C: the
.I expression
is evaluated,
and if it succeeds, the
.I statement
is executed and
control passes back to the beginning of the \f(CWwhile\fP
statement.
Unlike C, Snocone has no
.I break
or
.I continue
statements.
.Ty
.Sx
do
	\fIstatement\fP
while (\fIexpression\fP)
.Sy
.Tx
Behaves similarly to C: the
.I "statement"
is executed, then the
.I expression
is evaluated, and if the
.I expression
succeeds, control passes back to the beginning of the \f(CWdo\fP
statement.
.Tm for	S
.Ty
.Sx
for (\fIexpression1\fP, \fIexpression2\fP, \fIexpression3\fP)
	\fIstatement\fP
.Sy
.Tx
Equivalent to:
.P1 0
\fIexpression1\fP
while (\fIexpression3\fP) {
	\fIstatement\fP
	\fIexpression2\fP
}
.P2
.Ty
.Sx
{
	\fIstatement list\fP
}
.Sy
.Tx
The statements in the list, which may contain zero or
more statements, are executed in sequence.
.Ty
.Sx
\fIlabel\fP: \fIstatement\fP
.Sy
.Tx
All labels are global (because all \*(S4
labels are global), even across procedure boundaries,
so they must be chosen with care.
A useful convention is
to begin a label inside a procedure body with the name
of the procedure and an underscore.
.Ty
.Sx
go to \fIlabel\fP
.Sy
.Tx
The space between \f(CWgo\fP and \f(CWto\fP is optional.
It is wise
not to jump from a point inside one procedure into another.
Program
execution may be terminated by jumping to the reserved
label \f(CWEND\fP.
Labels \f(CWRETURN\fP, \f(CWFRETURN\fP, \f(CWNRETURN\fP, \f(CWABORT\fP, and
\f(CWCONTINUE\fP are also reserved.
.Ty
.Sx
return [ \fIexpression\fP ]
.Sy
.Tx
The current procedure returns to its caller.
If the
.I expression
is given, the procedure yields that value.
If no
.I expression
is given, the value returned is that of
the variable with the same name as the procedure; if
that variable was not assigned in the procedure, the
null string is returned.
.Ty
.Sx
freturn
.Sy
.Tx
The current procedure returns and fails.
.Ty
.Sx
nreturn [ \fIexpression\fP ]
.Sy
.Tx
The current procedure returns \f(CW$(\fP\fIexpression\fP\f(CW)\fP as an
lvalue.
If the
.I expression
is not given, the indirection
is applied to the variable with the same name as the
procedure.
An error results if this variable was not
given a value inside the procedure.
.NH 2
Procedures
.PP
Here is an example of a procedure declaration:
.P1 0
procedure gcd (m, n) {
	while (m != n) {
		if (m > n)
			m = m % n
		else
			n = n % m
	}
	return m
}
.P2
.PP
Arguments are passed by value, but note that the value passed for
an aggregate argument (array, table, or structure) is really a
pointer to the aggregate itself.
.PP
If a procedure is called with too few arguments, extra null strings
are supplied as necessary.
If called with too many arguments, the extras
are quietly ignored.
.PP
Local variables can be nominated for a procedure:
.P1 0
procedure f(x) y, z {...
.P2
All variables are global, but name scoping is dynamic.
One way to look
at it is to imagine that when a procedure
is entered, all the variables defined in that procedure are saved and
set to null.
Those variables are then restored when the procedure returns.
.PP
Thus, the following example prints 5 and then 1:
.P1 0
a = 1
f()
g()

procedure f() a {
	a = 5
	g()
}

procedure g() {
	OUTPUT = a
}
.P2
The following example also prints 5 and then 1:
.P1 0
a = 1
b = .a
f()
g()
.P3
procedure f() a {
	a = 5
	g()
}
.P3
procedure g() {
	OUTPUT = $b
}
.P2
Local variables can only be associated with procedures.
Procedures are
recursive.
.NH 2
Input-Output
.PP
All I/O is done through ``associated variables''.
A variable may be
input-associated or output-associated (or both).
Whenever a value is
assigned to an output-associated variable, that value is automatically
written in the file associated with that variable.
Whenever a value
is requested for an input-associated variable, a line is read from the
file associated with that variable, and the contents of the line are
used for the value.
When the end of an input file is reached,
any attempts to access variables associated
with that file will fail.
.PP
Initially, the variable \f(CWOUTPUT\fP is output-associated with the standard
output file, the variable \f(CWINPUT\fP is input-associated with the standard
input file, and the variable \f(CWTERMINAL\fP is both input-
and output-associated
with the user's terminal.
.PP
Thus, the following program copies its standard
input to its standard output, a line at a time:
.P1 0
while (OUTPUT = INPUT) {}
.P2
.PP
New associations are formed by the \f(CWINPUT\fP
and \f(CWOUTPUT\fP procedures:
.P1 0
INPUT  (\fIname\fP, \fIchannel\fP, \fIfile\fP)
OUTPUT (\fIname\fP, \fIchannel\fP, \fIfile\fP)
.P2
In both cases, 
.I name
is the name of the variable to be associated (the
name of the variable
.I x
is \f(CW.x\fP), and
.I file
is the file to be used.
.I Channel
is a string that you will use to identify subsequent operations
on that file.
Internally, there is a one-to-one correspondence between
channel names and file descriptors.
.PP
The file argument must not be given for other than the first call to
\f(CWINPUT\fP or \f(CWOUTPUT\fP on a given channel, as a channel can only be
connected to a single file.
.PP
Other procedures dealing with input-output are:
.Tm SET	S
.Sx
SET (\fIchannel\fP, \fIoffset\fP, \fIwhence\fP)
.Sy
.Tx
This function repositions the file specified by its
first argument in a system-dependent manner.
In implementations running under the
.UX
system, the behavior is similar to the
.I lseek
system call.
.Ty
.Sx
REWIND (\fIchannel\fP)
.Sy
.Tx
Repositions the named channel to the beginning of the file.
.Ty
.Sx
ENDFILE (\fIchannel\fP)
.Sy
.Tx
Indicates that you are done using the given channel.
All variables associated through that channel are
disassociated, output buffers are flushed (if any),
and the file is closed.
.Ty
.Sx
DETACH (\fIname\fP)
.Sy
.Tx
Disassociates the named variable.
Does not close
the file: a later call to \f(CWINPUT\fP or \f(CWOUTPUT\fP can reassociate it.
.NH 2
Pattern Matching
.PP
A
.I pattern
is a data structure that describes a class of strings.
Patterns
are used by the \f(CW?\fP operator,
which determines if the string given as its
left operand contains a substring described by the pattern given as its
right operand.
The part of the Snocone system that
does this is called the
.I scanner .
The input to the scanner is the string to be searched,
called the
.I subject ,
and the pattern sought.
.PP
The scanner tries to find a substring of the
subject that is matched by the pattern.
It first looks at substrings starting at
the first character of the subject.
If it doesn't find one, it tries the ones
starting at the second subject character,
and so on.
If the scanner cannot find an appropriate
substring, the pattern match fails.
.PP
If the \f(CW&ANCHOR\fP system variable is
nonzero, the scanner only looks at substrings
that start at the first character of the
subject.
This is said to be an
.I anchored
pattern match.
The \f(CW&ANCHOR\fP variable is zero at the
start of program execution.
.PP
The important part of pattern matching is therefore
determining whether the subject contains a
substring that starts at a given character
and matches a given pattern.
To understand how this is done, we must
take a closer look at patterns.
.PP
Every pattern is a concatenation of one or more
.I elements ,
each of which has zero or more
.I alternatives .
Each alternative may itself be an arbitrarily complicated pattern.
Some patterns have alternatives that are determined
dynamically as they are needed during pattern matching.
.PP
If a pattern has only one element, the scanner determines
if it matches at a given point by trying to match each
of the pattern's alternatives at that point.
If no alternative matches at that point, the element
cannot be matched.
.PP
If the pattern has more than one element,
matching that pattern at a given point means:
(a) finding an alternative for the first element
of the pattern that matches a subject substring
starting at the given point, and that also (b) allows
the remaining elements of the pattern to match
a substring that starts immediately after the
substring matched in (a).
.PP
During pattern matching, the scanner keeps its place
by means of an internal value called the
.I cursor ,
which represents the number of characters in the
subject that precede the current location.
These characters are counted from the beginning
of the subject even when the scanner is trying
to match a substring that starts at some other place
in the subject.
.PP
The concatenation of two patterns \f(CWP1\fP and \f(CWP2\fP
is a pattern whose elements
are \f(CWP1\fP and \f(CWP2\fP.
The alternation operator \f(CW|\fP similarly constructs alternatives.
When a string is used as a pattern, it matches only itself.
Thus,
.P1 0
"a" | "e" | "i" | "o" | "u"
.P2
is a pattern that matches any lower-case vowel.
.PP
Consider the following
statement:
.P1 0
P1 = ("ab" | "a") && ("b" | "c")
.P2
This assigns to \f(CWP\fP a pattern with two elements.
The first one matches
either \f(CWab\fP or \f(CWa\fP, and the second matches either
\f(CWb\fP or \f(CWc\fP.
Look at:
.P1 0
"ab" ? P1
.P2
The scanner first tries the first alternative of the first element of
P1 by matching \f(CWab\fP in the subject with \f(CWab\fP in the pattern.
This alternative matches, so the element (\f(CW"ab"|"a"\fP) matches, and
the scanner goes on to the second element (\f(CW"b"|"c"\fP).
Here, it will be unsuccessful
in matching both \f(CWb\fP and \f(CWc\fP,
because it has already exhausted all the
characters of the subject.
Thus the scanner fails to match the second element of
\f(CWP1\fP, and must back up and rematch the first element.
Fortunately, the
first element has an alternative in \f(CWa\fP;
this matches, and now the scanner
can try to match the second element (\f(CW"b"|"c"\fP) again.
The first alternative (\f(CWb\fP) succeeds, so
the \f(CW?\fP operator therefore succeeds.
If we wanted to examine just how the pattern elements matched,
we could have written:
.P1 0
P1a = ("ab"|"a").OUTPUT && ("b"|"c").OUTPUT
"ab" ? P1a
.P2
This would print \f(CWa\fP and \f(CWb\fP on separate lines,
showing that \f(CW"ab"|"a"\fP matched \f(CWa\fP and
\f(CW"b"|"c"\fP matched \f(CWb\fP.
.PP
When any pattern is first encountered
during pattern matching, it will match some string, and when the scanner
backs into it, it may match some other string.
Because many patterns behave
differently on their initial match than when rematched, it is useful to
describe the two cases separately.
.PP
For instance, when a string is used as a pattern, it initially matches
itself.
If the scanner later backs into it, it fails.
.PP
As another example, consider \f(CWP1|P2\fP.
This pattern matches every
possibility for \f(CWP1\fP, and when it has exhausted \f(CWP1\fP,
then matches every
possibility for \f(CWP2\fP before finally failing.
.PP
With this in mind, we can describe the various pattern-matching operators,
procedures, and pre-defined variables:
.IP "\f(CWP1 && P2\fP" 10
Tries to match \f(CWP1\fP, fails if it can't.
Then tries to match \f(CWP2\fP.
If \f(CWP2\fP fails, tries the next alternative for
\f(CWP1\fP, and then tries \f(CWP2\fP again.
Eventually, it either
runs out of alternatives for \f(CWP1\fP, in which case \f(CWP1&&P2\fP
fails, or it finds alternatives for both \f(CWP1\fP and \f(CWP2\fP which
allow them to match.
.IP "\f(CWP1 | P2\fP"
Tries to match \f(CWP1\fP.
If successful, \f(CWP1|P2\fP matches the
string that was matched by \f(CWP1\fP.
Otherwise, matches whatever
\f(CWP2\fP matches, and fails if \f(CWP2\fP fails.
.IP "\f(CWP $ V\fP"
Tries to match \f(CWP\fP.
If \f(CWP\fP matches, a copy of the substring
matched by \f(CWP\fP is immediately assigned to the variable \f(CWV\fP.
.IP "\f(CWP . V\fP"
Tries to match \f(CWP\fP.
If the entire pattern match of which
this element is a part is ultimately successful, a copy
of the substring matched by \f(CWP\fP is assigned to \f(CWV\fP after
the entire match completes.
.IP "\f(CW@N\fP"
Matches a null string, and immediately assigns
cursor position to the variable \f(CWN\fP.
The cursor position is
the number of characters that precede this null string
in the subject of the entire pattern match.
Thus:
.P1
"abcde" ? @OUTPUT && "c"
.P2
prints \f(CW0\fP, \f(CW1\fP, and \f(CW2\fP on separate lines.
.IP "\f(CWABORT\fP"
The entire pattern match is aborted.
.IP "\f(CWARB\fP"
A pre-defined variable that matches anything at all.
More specifically, it initially matches the null string.
When backed into, it matches a string one character longer
than the one it matched last time.
.IP "\f(CWBAL\fP"
A pre-defined variable that matches a non-empty parenthesis-
balanced string.
This string is balanced only with respect to parentheses,
and not any other kinds of brackets.
.IP "\f(CWFAIL\fP"
A pre-defined variable that always fails to match.
It
is sometimes useful to force the scanner to try all
alternatives for a pattern.
For instance:
.P1
s ARB $ OUTPUT && FAIL
.P2
prints all substrings of \f(CWs\fP.
.IP "\f(CWFENCE\fP"
Initially matches the null string, but if the scanner
backs into it, the entire pattern match is aborted.
Identical to to \f(CW""|ABORT\fP.
.IP "\f(CWREM\fP"
Matches from the current position to the end of the
subject.
Identical to \f(CWRTAB(0)\fP.
.IP "\f(CWSUCCEED\fP"
Identical to \f(CWARBNO("")\fP
.IP "\f(CWANY(s)\fP"
Matches any single character in \f(CWs\fP.
.IP "\f(CWARBNO(p)\fP"
A pre-defined procedure that yields a pattern that matches
zero or more copies of the pattern \f(CWp\fP.
Initially matches
the null string.
Each time the scanner backs into it,
it tries to extend the substring already matched by
matching one more instance of \f(CWp\fP.
.IP "\f(CWBREAK(s)\fP"
Matches a string starting at the current position up to
but not including the next character in the subject
that is also found somewhere in \f(CWs\fP.
Failure
if no such character is found.
No alternatives.
.IP "\f(CWBREAKX(s)\fP"
Like \f(CWBREAK\fP, but if backed into, it matches up to the
next subject character also found in \f(CWs\fP, and so on.
.IP "\f(CWLEN(n)\fP"
Matches exactly \f(CWn\fP characters.
.IP "\f(CWNOTANY(s)\fP"
Matches any single character not in \f(CWs\fP.
.IP "\f(CWPOS(n)\fP"
If exactly \f(CWn\fP subject characters precede the current
position, \f(CWPOS\fP matches the null string.
Otherwise it
fails.
.IP "\f(CWRPOS(n)\fP"
If exactly \f(CWn\fP subject characters remain to be matched,
\f(CWRPOS\fP matches the null string.
Otherwise it fails.
.IP "\f(CWRTAB(n)\fP"
If at least \f(CWn\fP subject characters remain in the subject,
\f(CWRTAB\fP matches characters until exactly \f(CWn\fP remain.
Otherwise
it fails.
.IP "\f(CWSPAN(s)\fP"
Starting at the current position, matches as many characters
as possible taken from \f(CWs\fP.
.IP "\f(CWTAB(n)\fP"
\f(CWTAB\fP matches from the current position forward up to
and including the \f(CWn\fPth character of the subject.
If more than \f(CWn\fP characters have already been matched
in the subject string, \f(CWTAB\fP fails.
.PP
All pattern-valued procedures can take an unevaluated expression as argument.
The expression will be evaluated when the pattern element is matched.
.SH
System Variables
.PP
The \f(CW&\fP operator looks at the name of its operand,
not its value.
For each of a small set of names, \f(CW&\fP yields a
.I "system variable" ,
whose value affects the operation of the system in some way.
For instance, we have already seen \f(CW&ANCHOR\fP, which
controls whether or not the scanner is restricted to examining
initial substrings of the subject.
The complete list of system variables is:
.IP "\f(CW&ABORT\fP" 13
The same value as the pre-defined variable \f(CWABORT\fP.
.IP "\f(CW&ALPHABET\fP"
A string that contains all the characters of the machine's
collating sequence, in order.
.IP "\f(CW&ANCHOR\fP"
If zero, all pattern matches are unanchored,
otherwise they are anchored.
.IP "\f(CW&ARB\fP"
The same value as the pre-defined variable \f(CWARB\fP.
.IP "\f(CW&BAL\fP"
The same value as the pre-defined variable \f(CWBAL\fP.
.IP "\f(CW&CODE\fP"
This variable is initially zero.
Its value is returned to the operating system when the
program finishes executing.
.IP "\f(CW&DUMP\fP"
This variable is initially zero.
If it is 1 at the end of execution, the values of
all variables are printed.
If it is 2, the values of all array, table, and structure
elements are also printed.
.IP "\f(CW&FAIL\fP"
The same value as the pre-defined variable \f(CWFAIL\fP.
.IP "\f(CW&FENCE\fP"
The same value as the pre-defined variable \f(CWFENCE\fP.
.IP "\f(CW&FNCLEVEL\fP"
The current level of procedure nesting.
.IP "\f(CW&INPUT\fP"
If this variable is set to 0, all
input association is suspended.
.IP "\f(CW&MAXLNGTH\fP"
The maximum length of a string.
This value cannot be increased beyond its
initial value.
.IP "\f(CW&OUTPUT\fP"
If this variable is set to 0, all output is suppressed
until it is again set nonzero.
.IP "\f(CW&REM\fP"
The same value as the pre-defined variable \f(CWREM\fP.
.IP "\f(CW&STCOUNT\fP"
A count of how many statements have been executed so far.
Because this counts \*(S4 statements, not
Snocone statements, it should be considered only approximate.
.IP "\f(CW&STLIMIT\fP"
When \f(CW&STCOUNT\fP reaches this value,
execution terminates with an error message.
It is initially 50000, and may be set freely.
If it is set to a negative value, statement
counting is disabled.
.IP "\f(CW&SUCCEED\fP"
The same value as the pre-defined variable \f(CWSUCCEED\fP.
.NH 1
Examples
.NH 2
Hello World
.PP
This is the canonical sample program:
.P1 0
OUTPUT = "Hello world!"
.P2
The present implementation of Snocone is case-insensitive
in identifiers, but future versions may well become
case-sensitive.
If so, it is likely that pre-defined names, such as
\f(CWINPUT\fP and \f(CWOUTPUT\fP, will have to be written
in upper case.
Keywords, such as \f(CWif\fP and \f(CWwhile\fP, will be
written in lower case.
.NH 2
Topological Sorting
.PP
We now develop
a topological sorting program based on the algorithm
described on page 262 of
.I "Fundamental Algorithms" |reference(knuth volume1).
The reader may wish to compare this program with the
\*(S4 program based on the same algorithm
that appears on pages 221-222 of
.I "The \*(S4 Programming Language" .
.PP
The input is
a set of pairs of objects,
where the first object in each pair
is considered to precede the second.
The output is a list of objects
in a sequence that meets all the constraints
implied by the input.
In other words, the program generates
a total ordering that includes a given
partial ordering.
.PP
If the input contains a loop, the program
will detect this fact and complain.
.PP
For example, given the following input:
.P1
letters alphanum
numbers alphanum
blanks optblanks
numbers real
numbers integer
letters variable
alphanum variable
binary binaryop
blanks binaryop
unqalphabet dliteral
unqalphabet sliteral
sliteral literal
dliteral literal
integer literal
real literal
.P2
the program will produce the following output:
.P1
letters
numbers
blanks
binary
unqalphabet
alphanum
real
integer
optblanks
binaryop
dliteral
sliteral
variable
literal
.P2
.PP
The basic strategy of the program is simple:
for each object, we remember how many immediate predecessors
it has, and store a list of all its immediate successors.
When we have finished reading the input,
we can immediately output the objects that have
no predecessors.
Each time we output an object, we remove it from
the data structure and decrement the predecessor
count of each of its immediate successors.
.PP
We will eventually reach a state in which we run
out of objects without predecessors.
When that happens, we are done.
If any objects remain, they form a loop.
.PP
To reduce the time spent searching for objects
without predecessors, we keep a queue of such
objects.
We must also keep a queue of the successors
of each object.
In both cases, we could use a stack instead of
a queue, but using a queue tends to favor the
order in which objects appeared in the input,
which makes the output more intuitively useful.
.PP
The following structure declarations and subroutines
manipulate queues.
A queue consists of a header (of type \f(CWqueue\fP)
which points to the first and last elements of
a singly-linked list of queue elements (of type \f(CWqel\fP).
.P1 0
struct queue {head, tail}
struct qel {obj, link}

.P3
procedure enqueue (q, x) y {
	if (head(q) :: "")
		head(q) = tail(q) = qel(x)
	else {
		y = qel(x)
		link(tail(q)) = y
		tail(q) = y
	}
}
.P3

procedure dequeue (q) x {
	if (head(q) :: "")
		freturn
	x = head(q)
	if ((head(q) = link(x)) :: "")
		tail(q) = ""
	return obj(x)
}
.P2
.PP
By convention, we use the null string to indicate
the end of a list.
This is convenient because uninitialized variables
and structure fields and missing arguments are automatically set to
the null string.
Thus, the test
.P1
head(q) :: ""
.P2
is a convenient way of testing whether \f(CWhead(q)\fP
has been set or not.
.PP
We represent each object as a structure containing
the object's name (so we can print it), the count of
immediate predecessors, and the queue of successors:
.P1
struct object {name, count, suc}
.P2
.PP
Since we will be reading the names of objects
rather than the objects directly, we will need
to map names to objects.
This can easily be done with a table and a mapping subroutine
that creates elements in the table as needed:
.P1 0
namemap = TABLE()
objects = queue()

procedure getobj (name) {
    if ((getobj = namemap[name]) :: "") {
        getobj = namemap[name] =
            object (name, 0, queue())
        enqueue (objects, getobj)
        nobj = nobj + 1
    }
}
.P2
.PP
This procedure uses the feature that if no return value
is explicitly given, the value of the variable with the
same name as the procedure is used.
If an appropriate table element already
exists, the \f(CW::\fP operator will fail
and the value of \f(CWgetobj\fP will be the
value retrieved from the table.
As a side effect, we maintain a global queue of
all known objects in \f(CWobjects\fP and
count them in \f(CWnobj\fP.
.PP
Now that we can map from names to objects, it is
an easy matter to enter a new relation into
our data structure.
Procedure \f(CWenter\fP takes the
names of two objects:
.P1 0
procedure enter (p, q) {
	p = getobj (p)
	q = getobj (q)
	count(q) = count(q) + 1
	enqueue (suc(p), q)
}
.P2
.PP
We first locate the objects to which \f(CWp\fP
and \f(CWq\fP refer, creating them if necessary.
Since \f(CWp\fP precedes \f(CWq\fP, we increment
the predecessor count of \f(CWq\fP and append
\f(CWq\fP to the successor list of \f(CWp\fP.
.PP
Building the data structure is now just a matter of
scanning the input file:
.P1 0
while (line = INPUT) {
   if (line ? FENCE && BREAK(' ').p &&
		SPAN(' ') && rem.q)
        enter (p, q)
    else
        TERMINAL = "bad input line: " && line
}
.P2
.PP
The pattern match assumes that everything up to the first
blank in the input line is the first object in a
relation, and everything after the first blank is
the second object.
If the match fails, the program complains.
.PP
Once all the input has been read, we must initialize
the queue of minimal objects (objects without predecessors).
This was the reason for keeping a queue of all objects,
which is now destroyed to build the queue of minimal objects.
It is not necessary to destroy the queue, but it is more
convenient because it is then possible to use \f(CWdequeue\fP:
.P1 0
zeroes = queue()

while (x = dequeue (objects))
	if (count (x) == 0)
		enqueue (zeroes, x)
.P2
.PP
As long as there is a minimal object,
we can print its name, delete it, and
decrement the predecessor count of each
of its successors.
If we decrement a predecessor count to
zero, that object is now minimal.
.P1 0
while (x = dequeue (zeroes)) {
    nobj = nobj - 1
    OUTPUT = name(x)
    while (y = dequeue (suc (x))) {
        if ((count(y) = count(y) - 1) == 0)
            enqueue (zeroes, y)
    }
}
.P2
.PP
This loop runs until there are no more minimal objects.
If there are still elements remaining (\f(CWnobj\fP is nonzero),
then those elements form a loop.
.P1 0
if (nobj != 0)
    TERMINAL = "The ordering contains a loop."
.P2
.NH 1
References
.LP
|reference_placement