Mini-Unix/usr/doc/ctut/ct1

.sp |2.5i
.ce 3
.ps 12
.ft G
Programming in C _ A Tutorial
.ps 10
.sp
.ft R
Brian W. Kernighan
.ft I
.sp
Bell Laboratories, Murray Hill, N. J.
.sp |4.1i
.ft R
.ps 10
.fi
.vs 12p
.NH
Introduction
.PP
C is a computer language
available on the
.UC GCOS
and
.UC UNIX
operating systems at Murray Hill and (in preliminary form) on OS/360 at Holmdel.
C lets you write your programs clearly and simply _
it has decent control flow facilities so your code can be read
straight down the page, without labels or GOTO's;
it lets you write code that is compact without
being too cryptic;
it encourages modularity and good program organization;
and it provides good data-structuring facilities.
.PP
This memorandum is a tutorial to make learning C as painless as possible.
The first part concentrates on 
the central features of C;
the second part discusses
those parts of the language which are
useful (usually for getting more efficient
and smaller code)
but which are not necessary for the new user.
This is
.ul
not
a reference manual.
Details and special cases will be skipped ruthlessly,
and no attempt will be made to cover every language feature.
The order of presentation is hopefully pedagogical
instead of logical.
Users who would like the full story should consult the
.ul
C Reference Manual
by D. M. Ritchie [1],
which should be read for details anyway.
Runtime support is described in
[2] and [3];
you will have to read one of these to learn how
to compile and run a C program.
.PP
We will assume that you are familiar with the mysteries of 
creating files,
text editing, and the like
in the operating system you run on,
and that you have programmed in some language before.
.NH
A Simple C Program
.PP
.E1
main(~) {
	printf("hello, world");
}
.E2
.PP
A C program consists of one or more
.ul
functions,
which are similar to
the functions and subroutines of a Fortran program or the procedures
of PL/I,
and perhaps some external data definitions.
.UL main
is such a function, and in fact all C programs must have a
.UL main\*.
Execution of the program begins at the first statement of 
.UL main\*.
.UL main
will usually invoke other functions to perform its job, some
coming from the same program, and others from libraries.
.PP
One method of communicating data between functions
is by arguments.
The parentheses following the function name surround the argument list;
here
.UL main
is a function of no arguments, indicated by (~).
The {} enclose the statements of the function.
Individual statements end with a semicolon
but are otherwise free-format.
.PP
.UL printf
is a library function which will format and print
output
on the terminal (unless some other destination is
specified).
In this  case it prints
.E1
hello, world
.E2
A function is invoked by naming it,
followed by a list of arguments in parentheses.
There is
no
.UC CALL
statement as in Fortran or 
.UC PL/I.
.NH
A Working C Program; Variables; Types and Type Declarations
.PP
Here's a bigger program that adds three integers and prints their sum.
.E1
main(~) {
	int a, b, c, sum;
	a = 1;  b = 2;  c = 3;
	sum = a + b + c;
	printf("sum is %d", sum);
}
.E2
.PP
Arithmetic and the assignment statements are much
the same as in Fortran (except for the semicolons)
or
.UC PL/I.
The format of C programs is quite free.
We can put several statements on a line if we want,
or we can split a statement among several lines if
it seems desirable. The split may be between any of the operators or variables,
but
.ul
not
in the middle of a name or operator.
As a matter of style,
spaces, tabs, and newlines should be used freely
to enhance readability.
.PP
C has four 
fundamental
.ul
types
of variables:
.DS
\fGint\fR	integer (PDP-11: 16 bits; H6070: 36 bits; IBM360: 32 bits)
\fGchar\fR	one byte character (PDP-11, IBM360: 8 bits; H6070: 9 bits)
\fGfloat\fR	single-precision floating point
\fGdouble\fR	double-precision floating point
.DE
There are also
.ul
arrays
and
.ul
structures
of these basic types,
.ul
pointers
to them
and
.ul
functions
that return them,
all of which we will meet shortly.
.PP
.ul
All
variables in a C program must be declared,
although this can sometimes be done implicitly by context.
Declarations must precede executable statements.
The declaration
.E1
int a, b, c, sum;
.E2
declares
.UL a,
.UL b,
.UL c,
and
.UL sum
to be integers.
.PP
Variable names have one to eight characters, chosen from A-Z, a-z, 0-9, and \(ul,
and start with a non-digit.
Stylistically, it's much better to use only a single case
and give functions and external variables names that are unique in the first
six characters.
(Function and external variable names are used by various assemblers, some of which are limited
in the size and case of identifiers they can handle.)
Furthermore, keywords and library functions
may only be recognized in one case.
.NH
Constants
.PP
We have already seen decimal integer constants in
the previous example _
1, 2, and 3.
Since C is often used for system programming and bit-manipulation, octal
numbers are an important part of the language.
In C, any number that begins with 0
(zero!)
is an octal integer (and hence can't have
any 8's or 9's in it).
Thus 0777 is an octal constant, with decimal value 511.
.PP
A ``character'' is one byte
(an inherently machine-dependent concept).
Most often this is expressed as a 
.ul
character constant,
which is one character enclosed in single quotes.
However, it may be any quantity that fits in a byte,
as in
.UL flags
below:
.E1
char quest, newline, flags;
quest = '?';
newline = '\\n';
flags = 077;
.E2
.PP
The sequence `\\n' is C notation for ``newline character'', which, when printed, skips
the terminal to the beginning of the next line.
Notice that `\\n' represents only a single character.
There are several other ``escapes'' like `\\n'  for representing hard-to-get or invisible
characters,
such as
`\\t' for tab,
`\\b' for backspace,
`\\0' for end of file,
and
`\\\\' for the backslash itself.
.PP
.UL float
and
.UL double
constants
are discussed in section 26.
.NH
Simple I/O _ getchar, putchar, printf
.PP
.E1
main( ) {
	char c;
	c = getchar(~);
	putchar(c);
}
.E2
.PP
.UL getchar
and
.UL putchar
are the basic I/O library functions in C.
.UL getchar
fetches one character
from the standard input
(usually the terminal)
each time it is called, and returns that character
as the
value of the function.
When it reaches the end of whatever file it is reading,
thereafter it returns the character represented by `\\0'
(ascii
.UC NUL,
which has value zero).
We will see how to use this very shortly.
.PP
.UL putchar
puts one character out on the standard output
(usually the terminal)
each time it is called.
So the program above
reads one character and writes it back out.
By itself, this isn't very interesting,
but observe that if we put a loop around this,
and add a test for end of file,
we have a complete program for
copying one file to another.
.PP
.UL printf
is a more complicated function
for producing formatted output.
We will talk about only the simplest use
of it.
Basically,
.UL printf
uses its first argument as formatting information,
and any successive arguments
as variables to be output.
Thus
.E1
printf ("hello, world\\n");
.E2
is the simplest use _
the string ``hello, world\\n''
is printed out.
No formatting information, no variables,
so the string is dumped out verbatim.
The newline is necessary to put this out on a line by itself.
(The construction
.E1
"hello, world\\n"
.E2
is really an array of
.UL chars\*.
More about this shortly.)
.PP
More complicated, if
.UL sum
is
6,
.E1
printf ("sum is %d\\n", sum);
.E2
prints
.E1
sum is 6
.E2
Within the first argument of
.UL printf,
the characters ``%d'' signify that the next argument
in the argument list is to be printed as a
base 10
number.
.PP
Other useful formatting commands are ``%c'' to print out a single character,
``%s'' to print out an entire string,
and ``%o'' to print a number as octal instead of decimal
(no leading zero).
For example,
.E1
n = 511;
printf ("What is the value of %d in octal?", n);
printf ("  %s! %d decimal is %o octal\\n", "Right", n, n);
.E2
prints
.E1
.fi
What is the value of 511 in octal?
Right! 511 decimal is 777 octal
.E2
Notice that there is no newline at the end of the first
output line.
Successive calls to
.UL printf
(and/or
.UL putchar,
for that matter)
simply put out characters.
No newlines are printed unless you ask for them.
Similarly, on input, characters are read one at a time
as you ask for them.
Each line is generally terminated by a newline (\\n),
but there is otherwise no concept of record.