Mini-Unix/usr/doc/ctut/ct4

.NH
Pointers
.PP
A
.ul
pointer
in C is the address of something.
It is a rare case indeed when we care what
the specific address itself is,
but pointers are a quite common way to get at
the contents of something.
The unary operator `&' is used to produce the address of
an object, if it has one. Thus
.E1
	int a, b;
	b = &a;
.E2
puts the address of
.UL a
into
.UL b\*.
We can't do much with it except print it or pass it
to some other routine, because we haven't given
.UL b
the right kind of declaration.
But if we declare that
.UL b
is indeed a
.ul
pointer
to an integer, we're in good shape:
.E1
	int a, \**b, c;
	b = &a;
	c = \**b;
.E2
.UL b
contains the address of
.UL a
and
.UL `c 
.UL =
.UL \**b'
means to use the value in
.UL b
as an address, i.e., as a pointer.
The effect is that we get back the contents of 
.UL a,
albeit rather indirectly.
(It's always the case that 
.UL `\**&x'
is the same as
.UL x
if
.UL x
has an address.)
.PP
The most frequent use of pointers in C is for walking
efficiently along arrays.
In fact,
in the implementation of an array,
the array name
represents the address of the zeroth element of the array,
so you can't use it on the left side of an expression.
(You can't change the address of something by assigning to it.)
If we say
.E1
char \**y;
char x[100];
.E2
.UL y
is of type pointer to character
(although it doesn't yet point anywhere).
We can make 
.UL y
point to
an element of
.UL x
by either of
.E1
y = &x[0];
y = x;
.E2
Since 
.UL x
is the address of
.UL x[0]
this is legal and consistent.
.PP
Now 
.UL `\**y'
gives
.UL x[0]\*.
More importantly,
.E1
\**(y+1)	gives x[1]
\**(y+i)	gives x[i]
.E2
and the sequence
.E1
	y = &x[0];
	y\*+;
.E2
leaves 
.UL y
pointing at
.UL x[1]\*.
.PP
Let's use pointers
in a function
.UL length
that computes how long a character array is.
Remember that by convention all character arrays are
terminated with a `\\0'.
(And if they aren't, this program will blow up inevitably.)
The old way:
.E1
length(s)
   char s[ ]; {
	int n;
	for( n=0; s[n] != '\\0'; )
		n\*+;
	return(n);
}
.E2
Rewriting with pointers gives
.E1
length(s)
   char \**s; {
	int n;
	for( n=0; \**s != '\\0'; s\*+ )
		n\*+;
	return(n);
}
.E2
You can now see why we have to say what kind of thing
.UL s
points to _
if we're to increment it with
.UL s\*+
we have to increment it by the right amount.
.PP
The pointer version is more efficient
(this is almost always true)
but even more compact is
.E1
	for( n=0; \**s\*+ != '\\0'; n\*+ );
.E2
The 
.UL `\**s'
returns a character;
the
.UL `\*+'
increments the pointer so we'll get the next character
next time around.
As you can see, as we make things more efficient,
we also make them less clear.
But
.UL `\**s\*+'
is an idiom so common
that you have to know it.
.PP
Going a step further,
here's our function
.UL strcopy
that copies a character array
.UL s
to another
.UL t\*.
.E1
strcopy(s,t)
   char \**s, \**t; {
	while(\**t\*+ = \**s\*+);
}
.E2
We have omitted the test against `\\0',
because `\\0' is identically zero;
you will often see the code this way.
(You 
.ul
must
have a space after the `=':
see section 25.)
.PP
For arguments
to a function, and there only,
the declarations
.E1
char s[ ];
char \**s;
.E2
are equivalent _ a pointer to a type,
or an array of unspecified size of that type, are the same thing.
.PP
If this all seems mysterious, copy these forms until they become second nature.
You don't often need anything more complicated.
.NH
Function Arguments
.PP
Look back at the function
.UL strcopy
in the previous section.
We passed it two string names as arguments, then proceeded
to clobber both of them by incrementation.
So how come we don't lose the original strings
in the function that called
.UL strcopy?
.PP
As we said before,
C is a ``call by value'' language:
when you make a function call like
.UL f(x),
the
.ul
value
of
.UL x
is passed,
not its address.
So there's no way to 
.ul
alter
.UL x
from inside
.UL f\*.
If
.UL x
is an array
.UL (char 
.UL x[10])
this isn't a problem, because
.UL x
.ul
is
an address anyway,
and you're not trying to change it,
just what it addresses.
This is why
.UL strcopy
works as it does.
And it's convenient not to have to worry about
making temporary copies of the input arguments.
.PP
But what if
.UL x
is a scalar and
you do want to change it?
In that case,
you have to pass the
.ul
address
of
.UL x
to
.UL f,
and then use it as a pointer.
Thus for example, to interchange two integers, we must write
.E1
flip(x, y)
   int \**x, \**y; {
	int temp;
	temp = \**x;
	\**x = \**y;
	\**y = temp;
}
.E2
and to call 
.UL flip,
we have to pass the addresses of the variables:
.E1
flip (&a, &b);
.E2
.NH
Multiple Levels of Pointers; Program Arguments
.PP
When a C program is called,
the arguments on the command line are made available
to the main program as an argument count
.UL argc
and an array of character strings
.UL argv
containing the arguments.
Manipulating these arguments is one of the most common uses of
multiple levels of pointers
(``pointer to pointer to ...'').
By convention,
.UL argc
is greater than zero;
the first argument
(in
.UL argv[0])
is the command name itself.
.PP
Here is a program that simply echoes its arguments.
.E1
main(argc, argv)
   int argc;
   char \**\**argv; {
	int i;
	for( i=1; i < argc; i\*+ )
		printf("%s ", argv[i]);
	putchar('\\n');
}
.E2
Step by step:
.UL main
is called with two arguments,
the argument count and the array of arguments.
.UL argv
is a pointer to an array,
whose individual elements are pointers to arrays of characters.
The zeroth argument is the name of the command itself,
so we start to print 
with the first argument, until we've printed them all.
Each 
.UL argv[i]
is a character array, so we use a
.UL `%s'
in the
.UL printf\*.
.PP
You will sometimes see the declaration of
.UL argv
written as
.E1
char \**argv[ ];
.E2
which is equivalent.
But we can't use 
.UL char 
.UL argv[ 
.UL ][
.UL ],
because both dimensions are variable and
there would be no way to figure
out how big the array is.
.PP
Here's a bigger example using
.UL argc
and
.UL argv\*.
A common convention in C programs is that 
if the first argument is `\(mi',
it indicates a flag of some sort.
For example, suppose we want a program to be callable as
.E1
prog -abc arg1 arg2 \*.\*.\*.
.E2
where the `\(mi' argument is optional;
if it is present, it may be followed by any combination of
a, b, and c.
.E1
main(argc, argv)
   int argc;
   char \**\**argv; {
	\*.\*.\*.
	aflag = bflag = cflag  = 0;
	if( argc > 1 && argv[1][0] \*= '-' ) {
		for( i=1; (c=argv[1][i]) != '\\0'; i\*+ )
			if( c\*='a' )
				aflag\*+;
			else if( c\*='b' )
				bflag\*+;
			else if( c\*='c' )
				cflag\*+;
			else
				printf("%c?\\n", c);
		--argc;
		\*+argv;
	}
	\*.\*.\*.
.E2
.PP
There are several things worth noticing about this code.
First, there is a real need for the left-to-right evaluation
that && provides;
we don't want to look at
.UL argv[1]
unless we know it's there.
Second, the statements
.E1
	--argc;
	\*+argv;
.E2
let us march along the argument list by one position,
so we can skip over the flag argument as if it had never existed _
the rest of the program is independent of
whether or not there was a flag argument.
This only works because 
.UL argv
is a pointer which can be incremented.
.NH
The Switch Statement; Break; Continue
.PP
The
.UL switch
statement
can be used to replace the multi-way test
we used in the last example.
When the tests are like this:
.E1
if( c \*= 'a' ) \*.\*.\*.
else if( c \*= 'b' ) \*.\*.\*.
else if( c \*= 'c' ) \*.\*.\*.
else \*.\*.\*.
.E2
testing a value against a series of
.ul
constants,
the
switch
statement is often clearer and usually gives better code.
Use it like this:
.E1
switch( c ) {

case 'a':
	aflag\*+;
	break;
case 'b':
	bflag\*+;
	break;
case 'c':
	cflag\*+;
	break;
default:
	printf("%c?\\n", c);
	break;
}
.E2
The
.UL case
statements label the various actions we want;
.UL default
gets done if none of the other cases are satisfied.
(A
.UL default
is optional;
if it isn't there,
and none of the cases match,
you just fall out the bottom.)
.PP
The
.UL break
statement in this example is new.
It is there
because
the cases are just labels,
and after you do one of them,
you
.ul
fall through
to the next unless you take some explicit action to escape.
This is a mixed blessing.
On the positive side,
you can have multiple cases on a single statement;
we might want to allow both upper and lower case letters in our flag field,
so we could say
.E1
case 'a':  case 'A':	\*.\*.\*.
.SP
case 'b':  case 'B':	\*.\*.\*.
 etc\*.
.E2
But what if we just want to get out 
after doing
.UL case 
.UL `a'
?
We could get out of a
.UL case
of the
.UL switch
with a label and a
.UL goto,
but this is really ugly.
The
.UL break
statement lets us exit without either
.UL goto
or label.
.E1
switch( c ) {

case 'a':
	aflag\*+;
	break;
case 'b':
	bflag\*+;
	break;
 \*.\*.\*.
}
/\** the break statements get us here directly \**/
.E2
The
.UL break
statement
also works in
.UL for
and
.UL while
statements _
it causes an immediate exit from the loop.
.PP
The
.UL continue
statement works
.ul
only
inside
.UL for's
and
.UL while's;
it causes the next iteration of the loop to be started.
This means it goes to the increment part of the
.UL for
and the test part of the
.UL while\*.
We could have used a
.UL continue
in our example to get on with the next iteration
of the
.UL for,
but it seems clearer to use
.UL break
instead.