Mini-Unix/usr/doc/ctut/ct3

.NH
Increment and Decrement Operators
.PP
In addition to the usual
`\(mi',
C also has two other interesting
unary
operators, `++' (increment) and `\(mi\(mi' (decrement).
Suppose we want to count the lines
in a file.
.E1
main( ) {
	int c,n;
	n = 0;
	while( (c=getchar( )) != '\\0' )
		if( c \*= '\\n' )
			\*+n;
	printf("%d lines\\n", n);
}
.E2
.UL \*+n 
is equivalent to
.UL n=n+1
but clearer,
particularly when
.UL n
is a complicated expression.
`++' and `\(mi\(mi' can be applied only to
.UL int's
and
.UL char's
(and
.UL pointers
which we haven't got to yet).
.PP
The unusual feature of `++' and `\(mi\(mi' is that they
can be used either before or after a variable.
The value of
.UL \*+k
is the value of
.UL k
.ul
after
it has been incremented.
The value of
.UL k\*+
is
.UL k
.ul
before
it is incremented.
Suppose
.UL k
is 5.
Then
.E1
x = \*+k;
.E2
increments
.UL k
to 6 and then sets
.UL x
to
the resulting value,
i.e., to 6.
But
.E1
x = k\*+;
.E2
first sets
.UL x
to
to 5,
and
.ul
then
increments
.UL k
to 6.
The incrementing effect of
.UL \*+k
and
.UL k\*+
is the same,
but their values are respectively 5 and 6.
We shall soon see examples where both of these uses are important.
.NH
Arrays
.PP
In C, as in Fortran or PL/I, it is possible to make arrays
whose elements are basic types.
Thus we can make an array of 10
integers
with the declaration
.E1
int x[10];
.E2
The square brackets
mean
.ul
subscripting;
parentheses are used only for function references.
Array indexes begin at 
.ul
zero,
so the elements of
.UL x
are
.E1
x[0], x[1], x[2], \*.\*.\*., x[9]
.E2
If an array has
.UL n
elements, the largest subscript is
.UL n\(mi1\*.
.PP
Multiple-dimension arrays are provided,
though not much used above two dimensions.
The declaration and use look like
.E1
int name[10] [20];
n = name[i+j] [1] + name[k] [2];
.E2
Subscripts can be arbitrary integer expressions.
Multi-dimension arrays are stored by row (opposite to Fortran),
so the rightmost subscript varies fastest;
.UL name
has 10 rows and 20 columns.
.PP
Here is a program which reads a line,
stores it in a buffer,
and prints its length (excluding the newline at the end).
.E1
.ne 8
main( ) {
	int n, c;
	char line[100];
	n = 0;
	while( (c=getchar( )) != '\\n' ) {
		if( n < 100 )
			line[n] = c;
		n\*+;
	}
	printf("length = %d\\n", n);
}
.E2
.PP
As a more complicated problem,
suppose we want to print the count for each line in the input,
still storing the first 100 characters of each line.
Try it as an exercise before looking at the solution:
.E1
main( ) {
	int n, c; char line[100];
	n = 0;
	while( (c=getchar( )) != '\\0' ) 
		if( c \*= '\\n' ) {
			printf("%d\n", n);
			n = 0;
		}
		else {
			if( n < 100 ) line[n] = c;
			n\*+;
		}
}
.E2
.NH
Character Arrays; Strings
.PP
Text is usually kept as an array of characters,
as we did with
.UL line[ 
.UL ]
in the example above.
By convention in C,
the last character in a character array should be a `\\0'
because most programs that manipulate character arrays
expect it.
For example,
.UL printf
uses the `\\0' to detect the end of a character array
when printing it out with a `%s'.
.PP
We can copy a character array
.UL s
into another
.UL t
like this:
.E1
	i = 0;
	while( (t[i]=s[i]) != '\\0' )
		i\*+;
.E2
.PP
Most of the time we have to put in our own `\\0' at the end of a string;
if we want to print the line with
.UL printf,
it's necessary.
This code prints the character count before the line:
.E1
main( ) {
	int n;
	char line[100];
	n = 0;
	while( (line[n\*+]=getchar( )) != '\\n' );
	line[n] = '\\0';
	printf("%d:\\t%s", n, line);
}
.E2
Here we increment 
.UL n
in the subscript itself,
but only after the previous value has been used.
The character is read, placed in 
.UL line[n],
and only then
.UL n
is incremented.
.PP
There is one place and one place only
where C puts in the `\\0' at the end of a character array for you,
and that is in the construction
.E1
"stuff between double quotes"
.E2
The compiler puts a `\\0' at the end automatically.
Text enclosed in double quotes is called a
.ul
string;
its properties are precisely those of an (initialized) array of characters.
.NH
For Statement
.PP
The
.UL for
statement is a somewhat generalized
.UL while
that lets us put the initialization and increment parts
of a loop into a single statement along with the test.
The general form of the
.UL for
is
.E1
for( initialization; expression; increment )
	statement
.E2
The meaning is exactly
.E1
	initialization;
	while( expression ) {
		statement
		increment;
	}
.E2
Thus, the following code does the same array copy as the example in the previous section:
.E1
	for( i=0; (t[i]=s[i]) != '\\0'; i\*+ );
.E2
This slightly more
ornate example adds up the elements of an array:
.E1
	sum = 0;
	for( i=0; i<n; i\*+)
		sum = sum + array[i];
.E2
.PP
In the
.UL for
statement,
the initialization
can be left out if you want,
but the semicolon has to be there.
The increment is
also optional.
It is
.ul
not
followed by a semicolon.
The second clause, the test,
works the same way as in the
.UL while:
if the expression is true (not zero)
do another loop,
otherwise get on with the next statement.
As with the
.UL while,
the
.UL for
loop may be done zero times.
If the expression is left out, it is taken to be always true, so
.E1
for( ; ; ) \*.\*.\*.
.E2
and
.E1
while( 1 ) \*.\*.\*.
.E2
are both infinite loops.
.PP
You might ask why we use a
.UL for
since it's so much like a
.UL while\*.
(You might also ask why we use a
.UL while
because...)
The
.UL for 
is usually preferable because it keeps the code where it's used
and sometimes eliminates the need for compound statements,
as in this code that zeros a two-dimensional array:
.E1
for( i=0; i<n; i\*+ )
	for( j=0; j<m; j\*+ )
		array[i][j] = 0;
.E2
.NH
Functions; Comments
.PP
Suppose we want,
as part of a larger program,
to count the occurrences of the ascii
characters in some input text.
Let us also map illegal characters
(those with value>127 or <0) into one pile.
Since this is presumably an isolated part of the program,
good practice dictates making it a separate function.
Here is one way:
.E1
.ne 7
main(~) {
	int hist[129];		/\** 128 legal chars + 1 illegal group \**/
	\*.\*.\*.
	count(hist, 128);	/\** count the letters into hist \**/
	printf( \*.\*.\*. );		/\** comments look like this; use them \**/
	\*.\*.\*.		/\** anywhere blanks, tabs or newlines could appear \**/
}
.SP
count(buf, size)
   int size, buf[ ]; {
	int i, c;
	for( i=0; i<=size; i\*+ )
		buf[i] = 0;			/\** set buf to zero \**/
	while( (c=getchar(~)) != '\\0' ) {	/\** read til eof \**/
		if( c > size \*| c < 0 )
			c = size;		/\** fix illegal input \**/
		buf[c]\*+;
	}
	return;
}
.E2
We have already seen many examples of calling a function,
so let us concentrate on how to 
.ul
define
one.
Since 
.UL count
has two arguments, we need to declare them,
as shown,
giving their types, and in the case of
.UL buf,
the fact
that it is an array.
The declarations of arguments go
.ul
between
the argument list
and the opening `{'.
There is no need to specify the size of the array 
.UL buf,
for it is defined outside of
.UL count\*.
.PP
The
.UL return
statement simply says to go back to the calling routine.
In fact, we could have omitted it,
since a return is implied at the end of a function.
.PP
What if we wanted 
.UL count
to return a value, say the number of characters read?~
The 
.UL return
statement allows for this too:
.E1
	int i, c, nchar;
	nchar = 0;
	\*.\*.\*.
	while( (c=getchar(~)) != '\\0' ) {
		if( c > size \*| c < 0 )
			c = size;
		buf[c]\*+;
		nchar\*+;
	}
	return(nchar);
.E2
Any expression can appear within the parentheses.
Here is a function to compute the minimum of two integers:
.E1
.ne 4
min(a, b)
   int a, b; {
	return( a < b ? a : b );
}
.E2
.PP
.PP
To copy a character array,
we could write the function
.E1
.ne 5
strcopy(s1, s2)		/\** copies s1 to s2 \**/
   char s1[ ], s2[ ]; {
	int i;
	for( i = 0; (s2[i] = s1[i]) != '\\0'; i\*+ );
}
.E2
As is often the case, all the work is done by the assignment statement
embedded in the test part of the
.UL for\*.
Again, the declarations of the arguments 
.UL s1 
and
.UL s2
omit the sizes, because they don't matter to
.UL strcopy\*.
(In the section on pointers, we will see a more efficient
way to do a string copy.)
.PP
There is a subtlety
in function usage which can trap the unsuspecting Fortran programmer.
Simple variables (not arrays) are passed in C by
``call by value'',
which means that the called function is given
a copy of its arguments,
and doesn't know their addresses.
This makes it impossible to change the value of one of the actual input arguments.
.a
.PP
There are two ways out of this dilemma.
One is to make special arrangements to pass to the function
the address of a variable instead of its value.
The other is to make the variable a global or external variable,
which is known to each function by its name.
We will discuss both possibilities in the next few sections.
.NH
Local and External Variables
.PP
If we say
.E1
f( ) {
	int x;
	\*.\*.\*.
}
g( ) {
	int x;
	\*.\*.\*.
}
.E2
each 
.UL x
is
.ul
local
to its own routine _
the
.UL x
in
.UL f
is unrelated to the
.UL x
in
.UL g\*.
(Local variables are also called ``automatic''.)
Furthermore each local variable in a routine appears only when
the function is called, and
.ul
disappears
when the function is exited.
Local variables have no memory from one call to the next
and must be explicitly initialized upon each entry.
(There is a
.UL static
storage class for making local variables with memory;
we won't discuss it.)
.PP
As opposed to local variables,
.ul
external variables
are defined external to all functions,
and are (potentially) available to all functions.
External storage
.ne 6
always remains in existence.
To make variables external we have to
.ul
define
them external to all functions,
and, wherever we want to use them,
make a
.ul
declaration.
.E1
main(~) {
	extern int nchar, hist[ ];
	\*.\*.\*.
	count(~);
	\*.\*.\*.
}
.SP
.ne 7
count(~) {
	extern int nchar, hist[ ];
	int i, c;
	\*.\*.\*.
}
.SP
int	hist[129];	/\** space for histogram \**/
int	nchar;		/\** character count \**/
.E2
Roughly speaking, any function that wishes to access an external variable
must contain an
.UL extern
declaration
for it.
The declaration is the same as others,
except for the added keyword
.UL extern\*.
Furthermore, there must somewhere be a
.ul
definition
of the external variables external to all functions.
.PP
External variables can be initialized;
they are set to zero if not explicitly
initialized.
In its simplest form,
initialization is done by putting the value (which must be a constant) after the definition:
.E1
int	nchar	0;
char	flag	'f';
.ft R
  etc\*.
.E2
This is discussed further in a later section.
.SP
.PP
This ends our discussion of what might be called
the central core of C.
You now have enough to write quite substantial
C programs,
and it would probably be a good idea if you paused
long enough to do so.
The rest of this tutorial
will describe some more ornate constructions,
useful but not essential.
.SP
.SP