SysIII/usr/src/man/docs/u_prog

.ds :? UNIX Programming
.de PT
.lt \\n(LLu
.pc %
.nr PN \\n%
.if \\n%-1 .if o .tl '\s9\f2\*(:?\fP''\\n(PN\s0'
.if \\n%-1 .if e .tl '\s9\\n(PN''\f2\*(:?\^\fP\s0'
.lt \\n(.lu
..
.hy 14
.TL
.bd 1 3
\!.bd 1 3
\f1U\s-2NIX\s+2 Programming\-Second Edition\fP
.AU
.bd 1
\!.bd 1
Brian W. Kernighan
.AU
Dennis M. Ritchie
.AI
.MH
.AB
.PP
This paper is an introduction to programming on
the
.UX
system.
The emphasis is on how to write programs that interface
to the operating system,
either directly or through the standard I/O library.
The topics discussed include
.IP "  \(bu"
handling command arguments
.IP "  \(bu"
rudimentary I/O; the standard input and output
.IP "  \(bu"
the standard I/O library; file system access
.IP "  \(bu"
low-level I/O: open, read, write, close, seek
.IP "  \(bu"
processes: exec, fork, pipes
.IP "  \(bu"
signals\-interrupts, etc.
.PP
There is also an appendix that describes
the standard I/O library in detail.
.AE
.SH "1. \|INTRODUCTION"
.PP
This paper describes how to write
programs
that interface with the
.UC UNIX
operating system in a non-trivial way.
This includes programs that use files by name,
that use pipes,
that invoke other commands as they run,
or that attempt to catch interrupts and other signals
during execution.
.PP
The document collects material which is scattered
throughout several sections of
.I
The
U\s-1NIX\s+1
Programmer's Manual
.R
[1]
for Version 7
.UC UNIX .
There is no attempt to be complete;
only generally useful material is dealt with.
It is assumed that you will be programming in C,
so you must be able to read the language
roughly up to the level of
.I
The C Programming Language
.R
[2].
Some of the material in sections 2 through 4
is based on
topics covered more carefully there.
You should also be familiar with
.UC UNIX
itself
at least
to the level of
.I
U\s-1NIX\s+1
for Beginners
.R
[3].
.SH "2. \|BASICS"
.SH "2.1 \|Program Arguments"
.PP
When a C program is run as a command,
the arguments on the command line are made available
to the
function
.UL main
as an argument count
.UL argc
and an array
.UL argv
of
pointers to
character strings
that contain
the arguments.
By convention,
.UL argv[0]
is the command name itself,
so
.UL argc
is always greater than 0.
.PP
The following program illustrates the mechanism:
it simply echoes its arguments
back to the terminal.
(This is essentially the
.UL echo
command.)
.P1
main(argc, argv)	/\(** echo arguments \(**/
int argc;
char \(**argv[\|];
{
	int i;
.sp 0.5v
	for (i = 1; i < argc; i++)
		printf("%s%c", argv[i], (i<argc-1) ? ' ' : '\n');
}
.P2
.UL argv
is a pointer to an array
whose individual elements are pointers to arrays of characters;
each is terminated by
.UL \e0 ,
so they can be treated as strings.
The program starts by printing
.UL argv[1]
and loops until it has printed them all.
.PP
The argument count and the arguments
are parameters to
.UL main .
If you want to keep them around so other
routines can get at them, you must
copy them to external variables.
.SH "2.2 \|The ``Standard Input'' and ``Standard Output''"
.PP
The simplest input mechanism is to read the ``standard input,''
which is generally the user's terminal.
The function
.UL getchar
returns the next input character each time it is called.
A file may be substituted for the terminal by
using the
.UL <
convention:
if
.UL prog
uses
.UL getchar ,
then
the command line
.P1
prog <file
.P2
causes
.UL prog
to read
.UL file
instead of the terminal.
.UL prog
itself need know nothing about where its input
is coming from.
This is also true if the input comes from another program via
the
.U
pipe mechanism:
.P1
otherprog | prog
.P2
provides the standard input for
.UL prog
from the standard output of
.UL otherprog.
.PP
.UL getchar
returns the value
.UL EOF
when it encounters the end of file
(or an error)
on whatever you are reading.
The value of
.UL EOF
is normally defined to be
.UL -1 ,
but it is unwise to take any advantage
of that knowledge.
As will become clear shortly,
this value is automatically defined for you when
you compile a program,
and need not be of any concern.
.PP
Similarly,
.UL putchar(c)
puts the character
.UL c
on the ``standard output,''
which is also by default the terminal.
The output can be captured on a file
by using
.UL > :
if
.UL prog
uses
.UL putchar ,
.P1
prog >outfile
.P2
writes the standard output on
.UL outfile
instead of the terminal.
.UL outfile
is created if it doesn't exist;
if it already exists, its previous contents are overwritten.
And a pipe can be used:
.P1
prog | otherprog
.P2
puts the standard output of
.UL prog
into the standard input of
.UL otherprog.
.PP
The function
.UL printf ,
which formats output in various ways,
uses
the same mechanism as
.UL putchar
does,
so calls to
.UL printf
and
.UL putchar
may be intermixed in any order;
the output will appear in the order of the calls.
.PP
Similarly, the function
.UL scanf
provides for formatted input conversion;
it will read the standard input and break it
up into strings, numbers, etc.,
as desired.
.UL scanf
uses the same mechanism as
.UL getchar ,
so calls to them may also be intermixed.
.PP
Many programs
read only one input and write one output;
for such programs I/O
with
.UL getchar ,
.UL putchar ,
.UL scanf ,
and
.UL printf
may be entirely adequate,
and it is almost always enough to get started.
This is particularly true if
the
.UC UNIX
pipe facility is used to connect the output of
one program to the input of the next.
For example, the following program
strips out all
.UC ASCII
control characters
from its input
(except for new-line and tab).
.P1
#include <stdio.h>
.sp 0.5v
main(\|)	/\(** ccstrip: strip non-graphic characters \(**/
{
	int c;
	while ((c = getchar(\|)) != EOF)
		if ((c >= ' ' && c < 0177) || c == '\t' || c == '\n')
			putchar(c);
	exit(0);
}
.P2
The line
.P1
#include <stdio.h>
.P2
should appear at the beginning of each source file.
It causes the C compiler to read a file
.IT /usr/include/stdio.h ) (
of
standard routines and symbols
that includes the definition of
.UL EOF .
.PP
If it is necessary to treat multiple files,
you can use
.UL cat
to collect the files for you:
.P1
cat file1 file2 .\|.\|. | ccstrip >output
.P2
and thus avoid learning how to access files from a program.
By the way,
the call to
.UL exit
at the end is not necessary to make the program work
properly,
but it assures that any caller
of the program will see a normal termination status
(conventionally 0)
from the program when it completes.
Section 6 discusses status returns in more detail.
.SH "3. \|THE STANDARD I/O LIBRARY"
.PP
The ``Standard I/O Library''
is a collection of routines
intended to provide
efficient
and portable
I/O services
for most C programs.
The standard I/O library is available on each system that supports C,
so programs that confine
their system interactions
to its facilities
can be transported from one system to another essentially without change.
.PP
In this section, we will discuss the basics of the standard I/O library.
The appendix contains a more complete description of its capabilities.
.SH "3.1 \|File Access"
.PP
The programs written so far have all
read the standard input and written the standard output,
which we have assumed are magically pre-defined.
The next step
is to write a program that accesses
a file that is
.ul
not
already connected to the program.
One simple example is
.IT wc ,
which counts the lines, words and characters
in a set of files.
For instance, the command
.P1
wc x.c y.c
.P2
prints the number of lines, words and characters
in
.UL x.c
and
.UL y.c
and the totals.
.PP
The question is how to arrange for the named files
to be read\-that is, how to connect the file system names
to the I/O statements which actually read the data.
.PP
The rules are simple.
Before it can be read or written
a file has to be
.ul
opened
by the standard library function
.UL fopen .
.UL fopen
takes an external name
(like
.UL x.c
or
.UL y.c ),
does some housekeeping and negotiation with the operating system,
and returns an internal name
which must be used in subsequent
reads or writes of the file.
.PP
This internal name is actually a pointer,
called a
.IT file
.IT pointer ,
to a structure
which contains information about the file,
such as the location of a buffer,
the current character position in the buffer,
whether the file is being read or written,
and the like.
Users don't need to know the details,
because part of the standard I/O definitions
obtained by including
.UL stdio.h
is a structure definition called
.UL FILE .
The only declaration needed for a file pointer
is exemplified by
.P1
FILE	\(**fp, \(**fopen(\|);
.P2
This says that
.UL fp
is a pointer to a
.UL FILE ,
and
.UL fopen
returns a pointer to
a
.UL FILE .
.UL FILE \& (
is a type name, like
.UL int ,
not a structure tag.
.PP
The actual call to
.UL fopen
in a program
is
.P1
fp = fopen(name, mode);
.P2
The first argument of
.UL fopen
is the
name
of the file,
as a character string.
The second argument is the
mode,
also as a character string,
which indicates how you intend to
use the file.
The only allowable modes are
read
.UL \&"r" ), (
write
.UL \&"w" ), (
or append
.UL \&"a" ). (
.PP
If a file that you open for writing or appending does not exist,
it is created
(if possible).
Opening an existing file for writing causes the old contents
to be discarded.
Trying to read a file that does not exist
is an error,
and there may be other causes of error
as well
(like trying to read a file
when you don't have permission).
If there is any error,
.UL fopen
will return the null pointer
value
.UL NULL
(which is defined as zero in
.UL stdio.h ).
.PP
The next thing needed is a way to read or write the file
once it is open.
There are several possibilities,
of which
.UL getc
and
.UL putc
are the simplest.
.UL getc
returns the next character from a file;
it needs the file pointer to tell it what file.
Thus
.P1
c = getc(fp)
.P2
places in
.UL c
the next character from the file referred to by
.UL fp ;
it returns
.UL EOF
when it reaches end of file.
.UL putc
is the inverse of
.UL getc :
.P1
putc(c, fp)
.P2
puts the character
.UL c
on the file
.UL fp
and returns
.UL c .
.UL getc
and
.UL putc
return
.UL EOF
on error.
.PP
When a program is started, three files are opened automatically,
and file pointers are provided for them.
These files are the standard input,
the standard output,
and the standard error output;
the corresponding file pointers are
called
.UL stdin ,
.UL stdout ,
and
.UL stderr .
Normally these are all connected to the terminal,
but
may be redirected to files or pipes as described in
Section 2.2.
.UL stdin ,
.UL stdout
and
.UL stderr
are pre-defined in the I/O library
as the standard input, output and error files;
they may be used anywhere an object of type
.UL FILE\ \(**
can be.
They are
constants, however,
.ul
not
variables,
so don't try to assign to them.
.PP
With some of the preliminaries out of the way,
we can now write
.IT wc .
The basic design
is one that has been found
convenient for many programs:
if there are command-line arguments, they are processed in order.
If there are no arguments, the standard input
is processed.
This way the program can be used stand-alone
or as part of a larger process.
.P1
#include <stdio.h>
.sp 0.5v
main(argc, argv)	/\(** wc: count lines, words, chars \(**/
int argc;
char \(**argv[\|];
{
	int c, i, inword;
	FILE \(**fp, \(**fopen(\|);
	long linect, wordct, charct;
	long tlinect = 0, twordct = 0, tcharct = 0;
.sp 0.5v
	i = 1;
	fp = stdin;
	do {
		if (argc > 1 && (fp=fopen(argv[i], "r")) == NULL) {
			fprintf(stderr, "wc: can't open %s\n", argv[i]);
			continue;
		}
		linect = wordct = charct = inword = 0;
		while ((c = getc(fp)) != EOF) {
			charct++;
			if (c == '\n')
				linect++;
			if (c == ' ' || c == '\t' || c == '\n')
				inword = 0;
			else if (inword == 0) {
				inword = 1;
				wordct++;
			}
		}
		printf("%7ld %7ld %7ld", linect, wordct, charct);
		printf(argc > 1 ? " %s\n" : "\n", argv[i]);
		fclose(fp);
		tlinect += linect;
		twordct += wordct;
		tcharct += charct;
	} while (++i < argc);
	if (argc > 2)
		printf("%7ld %7ld %7ld total\n", tlinect, twordct,
			tcharct);
	exit(0);
}
.P2
The function
.UL fprintf
is identical to
.UL printf ,
save that the first argument is a file pointer
that specifies the file to be
written.
.PP
The function
.UL fclose
is the inverse of
.UL fopen ;
it breaks the connection between the file pointer and the external name
that was established by
.UL fopen ,
freeing the
file pointer for another file.
Since there is a limit on the number
of files
that a program may have open simultaneously,
it's a good idea to free things when they are no longer needed.
There is also another reason to call
.UL fclose
on an output file\-it flushes the buffer
in which
.UL putc
is collecting output.
.UL fclose \& (
is called automatically for each open file
when a program terminates normally.)
.SH "3.2 \|Error Handling\-Stderr and Exit"
.PP
.UL stderr
is assigned to a program in the same way that
.UL stdin
and
.UL stdout
are.
Output written on
.UL stderr
appears on the user's terminal
even if the standard output is redirected.
.IT wc
writes its diagnostics on
.UL stderr
instead of
.UL stdout
so that if one of the files can't
be accessed for some reason,
the message
finds its way to the user's terminal instead of disappearing
down a pipeline
or into an output file.
.PP
The program actually signals errors in another way,
using the function
.UL exit
to terminate program execution.
The argument of
.UL exit
is available to whatever process
called it (see Section 6),
so the success or failure
of the program can be tested by another program
that uses this one as a sub-process.
By convention, a return value of 0
signals that all is well;
non-zero values signal abnormal situations.
.PP
.UL exit
itself
calls
.UL fclose
for each open output file,
to flush out any buffered output,
then calls
a routine named
.UL _exit .
The function
.UL _exit
causes immediate termination without any buffer flushing;
it may be called directly if desired.
.SH "3.3 \|Miscellaneous I/O Functions"
.PP
The standard I/O library provides several other I/O functions
besides those we have illustrated above.
.PP
Normally output with
.UL putc ,
etc., is buffered (except to
.UL stderr );
to force it out immediately, use
.UL fflush(fp) .
.PP
.UL fscanf
is identical to
.UL scanf ,
except that its first argument is a file pointer
(as with
.UL fprintf )
that specifies the file from which the input comes;
it returns
.UL EOF
at end of file.
.PP
The functions
.UL sscanf
and
.UL sprintf
are identical to
.UL fscanf
and
.UL fprintf ,
except that the first argument names a character string
instead of a file pointer.
The conversion is done from the string
for
.UL sscanf
and into it for
.UL sprintf .
.PP
.UL fgets(buf,\ size,\ fp)
copies the next line from
.UL fp ,
up to and including a new-line,
into
.UL buf ;
at most
.UL size-1
characters are copied;
it returns
.UL NULL
at end of file.
.UL fputs(buf,\ fp)
writes the string in
.UL buf
onto file
.UL fp .
.PP
The function
.UL ungetc(c,\ fp)
``pushes back'' the character
.UL c
onto the input stream
.UL fp ;
a subsequent call to
.UL getc ,
.UL fscanf ,
etc.,
will encounter
.UL c .
Only one character of push-back per file is permitted.
.SH "4. \|LOW-LEVEL I/O"
.PP
This section describes the
bottom level of I/O on the
.UC UNIX
system.
The lowest level of I/O in
.UC UNIX
provides no buffering or any other services;
it is in fact a direct entry into the operating system.
You are entirely on your own,
but on the other hand,
you have the most control over what happens.
And since the calls and usage are quite simple,
this isn't as bad as it sounds.
.SH "4.1 \|File Descriptors"
.PP
In the
.UC UNIX
operating system,
all input and output is done
by reading or writing files,
because all peripheral devices, even the user's terminal,
are files in the file system.
This means that a single, homogeneous interface
handles all communication between a program and peripheral devices.
.PP
In the most general case,
before reading or writing a file,
it is necessary to inform the system
of your intent to do so,
a process called
``opening'' the file.
If you are going to write on a file,
it may also be necessary to create it.
The system checks your right to do so
(Does the file exist?
Do you have permission to access it?),
and if all is well,
returns a small positive integer
called a
.ul
file descriptor.
Whenever I/O is to be done on the file,
the file descriptor is used instead of the name to identify the file.
(This is roughly analogous to the use of
.UC READ(5,\|.\|.\|.\|)
and
.UC WRITE(6,\|.\|.\|.\|)
in Fortran.)
All
information about an open file is maintained by the system;
the user program refers to the file
only
by the file descriptor.
.PP
The file pointers discussed in section 3
are similar in spirit to file descriptors,
but file descriptors are more fundamental.
A file pointer is a pointer to a structure that contains,
among other things, the file descriptor for the file in question.
.PP
Since input and output involving the user's terminal
are so common,
special arrangements exist to make this convenient.
When the command interpreter (the
``shell'')
runs a program,
it opens
three files, with file descriptors 0, 1, and 2,
called the standard input,
the standard output, and the standard error output.
All of these are normally connected to the terminal,
so if a program reads file descriptor 0
and writes file descriptors 1 and 2,
it can do terminal I/O
without worrying about opening the files.
.PP
If I/O is redirected
to and from files with
.UL <
and
.UL > ,
as in
.P1
prog <infile >outfile
.P2
the shell changes the default assignments for file descriptors
0 and 1
from the terminal to the named files.
Similar observations hold if the input or output is associated with a pipe.
Normally file descriptor 2 remains attached to the terminal,
so error messages can go there.
In all cases,
the file assignments are changed by the shell,
not by the program.
The program does not need to know where its input
comes from nor where its output goes,
so long as it uses file 0 for input and 1 and 2 for output.
.SH "4.2 \|Read and Write"
.PP
All input and output is done by
two functions called
.UL read
and
.UL write .
For both, the first argument is a file descriptor.
The second argument is a buffer in your program where the data is to
come from or go to.
The third argument is the number of bytes to be transferred.
The calls are
.P1
n_read = read(fd, buf, n);
.sp 0.5v
n_written = write(fd, buf, n);
.P2
Each call returns a byte count
which is the number of bytes actually transferred.
On reading,
the number of bytes returned may be less than
the number asked for,
because fewer than
.UL n
bytes remained to be read.
(When the file is a terminal,
.UL read
normally reads only up to the next new-line,
which is generally less than what was requested.)
A return value of zero bytes implies end of file,
and
.UL -1
indicates an error of some sort.
For writing, the returned value is the number of bytes
actually written;
it is generally an error if this isn't equal
to the number supposed to be written.
.PP
The number of bytes to be read or written is quite arbitrary.
The two most common values are
1,
which means one character at a time
(``unbuffered''),
and
512,
which corresponds to a physical block size on many peripheral devices.
This latter size will be most efficient,
but even character at a time I/O
is not inordinately expensive.
.PP
Putting these facts together,
we can write a simple program to copy
its input to its output.
This program will copy anything to anything,
since the input and output can be redirected to any file or device.
.P1
#define	BUFSIZE	512	/\(** best size for PDP-11 UNIX \(**/
.sp 0.5v
main(\|)	/\(** copy input to output \(**/
{
	char	buf[BUFSIZE];
	int	n;
.sp 0.5v
	while ((n = read(0, buf, BUFSIZE)) > 0)
		write(1, buf, n);
	exit(0);
}
.P2
If the file size is not a multiple of
.UL BUFSIZE ,
some
.UL read
will return a smaller number of bytes
to be written by
.UL write ;
the next call to
.UL read
after that
will return zero.
.if t .bp
.PP
It is instructive to see how
.UL read
and
.UL write
can be used to construct
higher level routines like
.UL getchar ,
.UL putchar ,
etc.
For example,
here is a version of
.UL getchar
which does unbuffered input.
.P1
#define	CMASK	0377	/\(** for making char's > 0 \(**/
.sp 0.5v
getchar(\|)	/\(** unbuffered single character input \(**/
{
	char c;
.sp 0.5v
	return((read(0, &c, 1) > 0) ? c & CMASK : EOF);
}
.P2
.UL c
.ul
must
be declared
.UL char ,
because
.UL read
accepts a character pointer.
The character being returned must be masked with
.UL 0377
to ensure that it is positive;
otherwise sign extension may make it negative.
(The constant
.UL 0377
is appropriate for the
.UC PDP -11
but not necessarily for other machines.)
.PP
The second version of
.UL getchar
does input in big chunks,
and hands out the characters one at a time.
.P1
#define	CMASK	0377	/\(** for making char's > 0 \(**/
#define	BUFSIZE	512
.sp 0.5v
getchar(\|)	/\(** buffered version \(**/
{
	static char	buf[BUFSIZE];
	static char	\(**bufp = buf;
	static int	n = 0;
.sp 0.5v
	if (n == 0) {	/\(** buffer is empty \(**/
		n = read(0, buf, BUFSIZE);
		bufp = buf;
	}
	return((--n >= 0) ? \(**bufp++ & CMASK : EOF);
}
.P2
.SH "4.3 \|Open, Creat, Close, Unlink"
.PP
Other than the default
standard input, output and error files,
you must explicitly open files in order to
read or write them.
There are two system entry points for this,
.UL open
and
.UL creat
[sic].
.PP
.UL open
is rather like the
.UL fopen
discussed in the previous section,
except that instead of returning a file pointer,
it returns a file descriptor,
which is just an
.UL int .
.P1
int fd;
.sp 0.5v
fd = open(name, rwmode);
.P2
As with
.UL fopen ,
the
.UL name
argument
is a character string corresponding to the external file name.
The access mode argument
is different, however:
.UL rwmode
is 0 for read, 1 for write, and 2 for read and write access.
.UL open
returns
.UL -1
if any error occurs;
otherwise it returns a valid file descriptor.
.PP
It is an error to
try to
.UL open
a file that does not exist.
The entry point
.UL creat
is provided to create new files,
or to re-write old ones.
.P1
fd = creat(name, pmode);
.P2
returns a file descriptor
if it was able to create the file
called
.UL name ,
and
.UL -1
if not.
If the file
already exists,
.UL creat
will truncate it to zero length;
it is not an error to
.UL creat
a file that already exists.
.PP
If the file is brand new,
.UL creat
creates it with the
.ul
protection mode
specified by
the
.UL pmode
argument.
In the
.UC UNIX
file system,
there are nine bits of protection information
associated with a file,
controlling read, write and execute permission for
the owner of the file,
for the owner's group,
and for all others.
Thus a three-digit octal number
is most convenient for specifying the permissions.
For example,
0755
specifies read, write and execute permission for the owner,
and read and execute permission for the group and everyone else.
.PP
To illustrate,
here is a simplified version of
the
.UC UNIX
utility
.IT cp ,
a program which copies one file to another.
(The main simplification is that our version
copies only one file,
and does not permit the second argument
to be a directory.)
.P1
#define NULL 0
#define BUFSIZE 512
#define PMODE 0644 /\(** RW for owner, R for group, others \(**/
.sp 0.5v
main(argc, argv)	/\(** cp: copy f1 to f2 \(**/
int argc;
char \(**argv[\|];
{
	int	f1, f2, n;
	char	buf[BUFSIZE];
.sp 0.5v
	if (argc != 3)
		error("Usage: cp from to", NULL);
	if ((f1 = open(argv[1], 0)) == -1)
		error("cp: can't open %s", argv[1]);
	if ((f2 = creat(argv[2], PMODE)) == -1)
		error("cp: can't create %s", argv[2]);
.sp 0.5v
	while ((n = read(f1, buf, BUFSIZE)) > 0)
		if (write(f2, buf, n) != n)
			error("cp: write error", NULL);
	exit(0);
}
.P2
.P1
error(s1, s2)	/\(** print error message and die \(**/
char \(**s1, \(**s2;
{
	printf(s1, s2);
	printf("\n");
	exit(1);
}
.P2
.PP
As we said earlier,
there is a limit (typically 15-25)
on the number of files which a program
may have open simultaneously.
Accordingly, any program which intends to process
many files must be prepared to re-use
file descriptors.
The routine
.UL close
breaks the connection between a file descriptor
and an open file,
and frees the
file descriptor for use with some other file.
Termination of a program
via
.UL exit
or return from the main program closes all open files.
.PP
The function
.UL unlink(filename)
removes the file
.UL filename
from the file system.
.SH "4.4 \|Random Access\-Seek and Lseek"
.PP
File I/O is normally sequential:
each
.UL read
or
.UL write
takes place at a position in the file
right after the previous one.
When necessary, however,
a file can be read or written in any arbitrary order.
The
system call
.UL lseek
provides a way to move around in
a file without actually reading
or writing:
.P1
lseek(fd, offset, origin);
.P2
forces the current position in the file
whose descriptor is
.UL fd
to move to position
.UL offset ,
which is taken relative to the location
specified by
.UL origin .
Subsequent reading or writing will begin at that position.
.UL offset
is
a
.UL long ;
.UL fd
and
.UL origin
are
.UL int 's.
.UL origin
can be 0, 1, or 2 to specify that
.UL offset
is to be
measured from
the beginning, from the current position, or from the
end of the file respectively.
For example,
to append to a file,
seek to the end before writing:
.P1
lseek(fd, 0L, 2);
.P2
To get back to the beginning (``rewind''),
.P1
lseek(fd, 0L, 0);
.P2
Notice the
.UL 0L
argument;
it could also be written as
.UL (long)\ 0 .
.PP
With
.UL lseek ,
it is possible to treat files more or less like large arrays,
at the price of slower access.
For example, the following simple function reads any number of bytes
from any arbitrary place in a file.
.P1
get(fd, pos, buf, n) /\(** read n bytes from position pos \(**/
int fd, n;
long pos;
char \(**buf;
{
	lseek(fd, pos, 0);	/\(** get to pos \(**/
	return(read(fd, buf, n));
}
.P2
.PP
Before Version 7,
the basic entry point to the
.UC UNIX
I/O system
was called
.UL seek .
.UL seek
is identical to
.UL lseek ,
except that its
.UL offset
argument is an
.UL int
rather than a
.UL long .
Accordingly,
since
.UC PDP -11
integers have only 16 bits,
the
.UL offset
specified
for
.UL seek
is limited to 65,535;
for this reason,
.UL origin
values of 3, 4, 5 cause
.UL seek
to multiply the given offset by 512
(the number of bytes in one physical block)
and then interpret
.UL origin
as if it were 0, 1, or 2 respectively.
Thus to get to an arbitrary place in a large file
requires two seeks, first one which selects
the block, then one which
has
.UL origin
equal to 1 and moves to the desired byte within the block.
.SH "4.5 \|Error Processing"
.PP
The routines discussed in this section,
and in fact all the routines which are direct entries into the system
can incur errors.
Usually they indicate an error by returning a value of \-1.
Sometimes it is nice to know what sort of error occurred;
for this purpose all these routines, when appropriate,
leave an error number in the external cell
.UL errno .
The meanings of the various error numbers are
listed
in the introduction to Section II
of the
.I
.UC UNIX
Programmer's Manual,
.R
so your program can, for example, determine if
an attempt to open a file failed because it did not exist
or because the user lacked permission to read it.
Perhaps more commonly,
you may want to print out the
reason for failure.
The routine
.UL perror
will print a message associated with the value
of
.UL errno ;
more generally,
.UL sys\_errno
is an array of character strings which can be indexed
by
.UL errno
and printed by your program.
.SH "5. \|PROCESSES"
.PP
It is often easier to use a program written
by someone else than to invent one's own.
This section describes how to
execute a program from within another.
.SH "5.1 \|The ``System'' Function"
.PP
The easiest way to execute a program from another
is to use
the standard library routine
.UL system .
.UL system
takes one argument, a command string exactly as typed
at the terminal
(except for the new-line at the end)
and executes it.
For instance, to time-stamp the output of a program,
.P1
main(\|)
{
	system("date");
	/\(** rest of processing \(**/
}
.P2
If the command string has to be built from pieces,
the in-memory formatting capabilities of
.UL sprintf
may be useful.
.PP
Remember than
.UL getc
and
.UL putc
normally buffer their input;
terminal I/O will not be properly synchronized unless
this buffering is defeated.
For output, use
.UL fflush ;
for input, see
.UL setbuf
in the appendix.
.SH "5.2 \|Low-Level Process Creation\-Execl and Execv"
.PP
If you're not using the standard library,
or if you need finer control over what
happens,
you will have to construct calls to other programs
using the more primitive routines that the standard
library's
.UL system
routine is based on.
.PP
The most basic operation is to execute another program
.ul
without
.IT returning ,
by using the routine
.UL execl .
To print the date as the last action of a running program,
use
.P1
execl("/bin/date", "date", NULL);
.P2
The first argument to
.UL execl
is the
.ul
file name
of the command; you have to know where it is found
in the file system.
The second argument is conventionally
the program name
(that is, the last component of the file name),
but this is seldom used except as a place-holder.
If the command takes arguments, they are strung out after
this;
the end of the list is marked by a
.UL NULL
argument.
.PP
The
.UL execl
call
overlays the existing program with
the new one,
runs that, then exits.
There is
.ul
no
return to the original program.
.PP
More realistically,
a program might fall into two or more phases
that communicate only through temporary files.
Here it is natural to make the second pass
simply an
.UL execl
call from the first.
.PP
The one exception to the rule that the original program never gets control
back occurs when there is an error, for example if the file can't be found
or is not executable.
If you don't know where
.UL date
is located, say
.P1
execl("/bin/date", "date", NULL);
execl("/usr/bin/date", "date", NULL);
fprintf(stderr, "Someone stole 'date'\n");
.P2
.PP
A variant of
.UL execl
called
.UL execv
is useful when you don't know in advance how many arguments there are going to be.
The call is
.P1
execv(filename, argp);
.P2
where
.UL argp
is an array of pointers to the arguments;
the last pointer in the array must be
.UL NULL
so
.UL execv
can tell where the list ends.
As with
.UL execl ,
.UL filename
is the file in which the program is found, and
.UL argp[0]
is the name of the program.
(This arrangement is identical to the
.UL argv
array for program arguments.)
.PP
Neither of these routines provides the niceties of normal command execution.
There is no automatic search of multiple directories\-you
have to know precisely where the command is located.
Nor do you get the expansion of metacharacters like
.UL < ,
.UL > ,
.UL \(** ,
.UL ? ,
and
.UL [\|]
in the argument list.
If you want these, use
.UL execl
to invoke the shell
.UL sh ,
which then does all the work.
Construct a string
.UL commandline
that contains the complete command as it would have been typed
at the terminal, then say
.P1
execl("/bin/sh", "sh", "-c", commandline, NULL);
.P2
The shell is assumed to be at a fixed place,
.UL /bin/sh .
Its argument
.UL -c
says to treat the next argument
as a whole command line, so it does just what you want.
The only problem is in constructing the right information
in
.UL commandline .
.SH "5.3 \|Control of Processes\-Fork and Wait"
.PP
So far what we've talked about isn't really all that useful by itself.
Now we will show how to regain control after running
a program with
.UL execl
or
.UL execv .
Since these routines simply overlay the new program on the old one,
to save the old one requires that it first be split into
two copies;
one of these can be overlaid, while the other waits for the new,
overlaying program to finish.
The splitting is done by a routine called
.UL fork :
.P1
proc_id = fork(\|);
.P2
splits the program into two copies, both of which continue to run.
The only difference between the two is the value of
.UL proc_id ,
the ``process id.''
In one of these processes (the ``child''),
.UL proc_id
is zero.
In the other
(the ``parent''),
.UL proc_id
is non-zero; it is the process number of the child.
Thus the basic way to call, and return from,
another program is
.P1
if (fork(\|) == 0)
	execl("/bin/sh", "sh", "-c", cmd, NULL);	/\(** in child \(**/
.P2
And in fact, except for handling errors, this is sufficient.
The
.UL fork
makes two copies of the program.
In the child, the value returned by
.UL fork
is zero, so it calls
.UL execl
which does the
.UL command
and then dies.
In the parent,
.UL fork
returns non-zero
so it skips the
.UL execl.
(If there is any error,
.UL fork
returns
.UL -1 ).
.PP
More often, the parent wants to wait for the child to terminate
before continuing itself.
This can be done with
the function
.UL wait :
.P1
int status;
.sp 0.5v
if (fork(\|) == 0)
	execl(\ .\|.\|.\ );
wait(&status);
.P2
This still doesn't handle any abnormal conditions, such as a failure
of the
.UL execl
or
.UL fork ,
or the possibility that there might be more than one child running simultaneously.
(The
.UL wait
returns the
process id
of the terminated child, if you want to check it against the value
returned by
.UL fork .)
Finally, this fragment doesn't deal with any
funny behavior on the part of the child
(which is reported in
.UL status ).
Still, these three lines
are the heart of the standard library's
.UL system
routine,
which we'll show in a moment.
.PP
The
.UL status
returned by
.UL wait
encodes in its low-order eight bits
the system's idea of the child's termination status;
it is 0 for normal termination and non-zero to indicate
various kinds of problems.
The next higher eight bits are taken from the argument
of the call to
.UL exit
which caused a normal termination of the child process.
It is good coding practice
for all programs to return meaningful
status.
.PP
When a program is called by the shell,
the three file descriptors
0, 1, and 2 are set up pointing at the right files,
and all other possible file descriptors
are available for use.
When this program calls another one,
correct etiquette suggests making sure the same conditions
hold.
Neither
.UL fork
nor the
.UL exec
calls affects open files in any way.
If the parent is buffering output
that must come out before output from the child,
the parent must flush its buffers
before the
.UL execl .
Conversely,
if a caller buffers an input stream,
the called program will lose any information
that has been read by the caller.
.SH "5.4 \|Pipes"
.PP
A
.ul
pipe
is an I/O channel intended for use
between two cooperating processes:
one process writes into the pipe,
while the other reads.
The system looks after buffering the data and synchronizing
the two processes.
Most pipes are created by the shell,
as in
.P1
ls | pr
.P2
which connects the standard output of
.UL ls
to the standard input of
.UL pr .
Sometimes, however, it is most convenient
for a process to set up its own plumbing;
in this section, we will illustrate how
the pipe connection is established and used.
.PP
The system call
.UL pipe
creates a pipe.
Since a pipe is used for both reading and writing,
two file descriptors are returned;
the actual usage is like this:
.P1
int	fd[2];
.sp 0.5v
stat = pipe(fd);
if (stat == -1)
	/\(** there was an error .\|.\|. \(**/
.P2
.UL fd
is an array of two file descriptors, where
.UL fd[0]
is the read side of the pipe and
.UL fd[1]
is for writing.
These may be used in
.UL read ,
.UL write
and
.UL close
calls just like any other file descriptors.
.PP
If a process reads a pipe which is empty,
it will wait until data arrives;
if a process writes into a pipe which
is too full, it will wait until the pipe empties somewhat.
If the write side of the pipe is closed,
a subsequent
.UL read
will encounter end of file.
.PP
To illustrate the use of pipes in a realistic setting,
let us write a function called
.UL popen(cmd,\ mode) ,
which creates a process
.UL cmd
(just as
.UL system
does),
and returns a file descriptor that will either
read or write that process, according to
.UL mode .
That is,
the call
.P1
fout = popen("pr", WRITE);
.P2
creates a process that executes
the
.UL pr
command;
subsequent
.UL write
calls using the file descriptor
.UL fout
will send their data to that process
through the pipe.
.PP
.UL popen
first creates the
the pipe with a
.UL pipe
system call;
it then
.UL fork s
to create two copies of itself.
The child decides whether it is supposed to read or write,
closes the other side of the pipe,
then calls the shell (via
.UL execl )
to run the desired process.
The parent likewise closes the end of the pipe it does not use.
These closes are necessary to make end-of-file tests work properly.
For example, if a child that intends to read
fails to close the write end of the pipe, it will never
see the end of the pipe file, just because there is one writer
potentially active.
.P1
#include <stdio.h>
.sp 0.5v
#define	READ	0
#define	WRITE	1
#define	tst(a, b)	(mode == READ ? (b) : (a))
static	int	popen_pid;
.sp 0.5v
popen(cmd, mode)
char	\(**cmd;
int	mode;
{
	int p[2];
.sp 0.5v
	if (pipe(p) < 0)
		return(NULL);
	if ((popen_pid = fork(\|)) == 0) {
		close(tst(p[WRITE], p[READ]));
		close(tst(0, 1));
		dup(tst(p[READ], p[WRITE]));
		close(tst(p[READ], p[WRITE]));
		execl("/bin/sh", "sh", "-c", cmd, 0);
		_exit(1);	/\(** disaster has occurred if we get here \(**/
	}
	if (popen_pid == -1)
		return(NULL);
	close(tst(p[READ], p[WRITE]));
	return(tst(p[WRITE], p[READ]));
}
.P2
The sequence of
.UL close s
in the child
is a bit tricky.
Suppose
that the task is to create a child process that will read data from the parent.
Then the first
.UL close
closes the write side of the pipe,
leaving the read side open.
The lines
.P1
close(tst(0, 1));
dup(tst(p[READ], p[WRITE]));
.P2
are the conventional way to associate the pipe descriptor
with the standard input of the child.
The
.UL close
closes file descriptor 0,
that is, the standard input.
.UL dup
is a system call that
returns a duplicate of an already open file descriptor.
File descriptors are assigned in increasing order
and the first available one is returned,
so
the effect of the
.UL dup
is to copy the file descriptor for the pipe (read side)
to file descriptor 0;
thus the read side of the pipe becomes the standard input.
(Yes, this is a bit tricky, but it's a standard idiom.)
Finally, the old read side of the pipe is closed.
.PP
A similar sequence of operations takes place
when the child process is supposed to write
from the parent instead of reading.
You may find it a useful exercise to step through that case.
.PP
The job is not quite done,
for we still need a function
.UL pclose
to close the pipe created by
.UL popen .
The main reason for using a separate function rather than
.UL close
is that it is desirable to wait for the termination of the child process.
First, the return value from
.UL pclose
indicates whether the process succeeded.
Equally important when a process creates several children
is that only a bounded number of unwaited-for children
can exist, even if some of them have terminated;
performing the
.UL wait
lays the child to rest.
Thus:
.P1
#include <signal.h>
.sp 0.5v
pclose(fd)	/\(** close pipe fd \(**/
int fd;
{
	register r, (\(**hstat)(\|), (\(**istat)(\|), (\(**qstat)(\|);
	int	 status;
	extern int popen_pid;
.sp 0.5v
	close(fd);
	istat = signal(SIGINT, SIG_IGN);
	qstat = signal(SIGQUIT, SIG_IGN);
	hstat = signal(SIGHUP, SIG_IGN);
	while ((r = wait(&status)) != popen_pid && r != -1);
	if (r == -1)
		status = -1;
	signal(SIGINT, istat);
	signal(SIGQUIT, qstat);
	signal(SIGHUP, hstat);
	return(status);
}
.P2
The calls to
.UL signal
make sure that no interrupts, etc.,
interfere with the waiting process;
this is the topic of the next section.
.PP
The routine as written has the limitation that only one pipe may
be open at once, because of the single shared variable
.UL popen_pid ;
it really should be an array indexed by file descriptor.
A
.UL popen
function, with slightly different arguments and return value is available
as part of the standard I/O library discussed below.
As currently written, it shares the same limitation.
.SH "6. \|SIGNALS\-INTERRUPTS AND ALL THAT"
.PP
This section is concerned with how to
deal gracefully with signals from
the outside world (like interrupts), and with program faults.
Since there's nothing very useful that
can be done from within C about program
faults, which arise mainly from illegal memory references
or from execution of peculiar instructions,
we'll discuss only the outside-world signals:
.IT interrupt ,
which is sent when the
.UC DEL
character is typed;
.IT quit ,
generated by the
.UC FS
character;
.IT hangup ,
caused by hanging up the phone;
and
.IT terminate ,
generated by the
.IT kill
command.
When one of these events occurs,
the signal is sent to
.IT all
processes which were started
from the corresponding terminal;
unless other arrangements have been made,
the signal
terminates the process.
In the
.IT quit
case, a core image file is written for debugging
purposes.
.PP
The routine which alters the default action
is
called
.UL signal .
It has two arguments: the first specifies the signal, and the second
specifies how to treat it.
The first argument is just a number code, but the second is the
address is either a function, or a somewhat strange code
that requests that the signal either be ignored, or that it be
given the default action.
The include file
.UL signal.h
gives names for the various arguments, and should always be included
when signals are used.
Thus
.P1
#include <signal.h>
\ \ .\|.\|.
signal(SIGINT, SIG_IGN);
.P2
causes interrupts to be ignored, while
.P1
signal(SIGINT, SIG_DFL);
.P2
restores the default action of process termination.
In all cases,
.UL signal
returns the previous value of the signal.
The second argument to
.UL signal
may instead be the name of a function
(which has to be declared explicitly if
the compiler hasn't seen it already).
In this case, the named routine will be called
when the signal occurs.
Most commonly this facility is used
to allow the program to clean up
unfinished business before terminating, for example to
delete a temporary file:
.P1
#include <signal.h>
.sp 0.5v
main(\|)
{
	int onintr(\|);
.sp 0.5v
	if (signal(SIGINT, SIG_IGN) != SIG_IGN)
		signal(SIGINT, onintr);
.sp 0.5v
	/\(** Process .\|.\|. \(**/
.sp 0.5v
	exit(0);
}
.sp 0.5v
onintr(\|)
{
	unlink(tempfile);
	exit(1);
}
.P2
.PP
Why the test and the double call to
.UL signal ?
Recall that signals like interrupt are sent to
.ul
all
processes started from a particular terminal.
Accordingly, when a program is to be run
non-interactively
(started by
.UL & ),
the shell turns off interrupts for it
so it won't be stopped by interrupts intended for foreground processes.
If this program began by announcing that all interrupts were to be sent
to the
.UL onintr
routine regardless,
that would undo the shell's effort to protect it
when run in the background.
.PP
The solution, shown above, is to test the state of interrupt handling,
and to continue to ignore interrupts if they are already being ignored.
The code as written
depends on the fact that
.UL signal
returns the previous state of a particular signal.
If signals were already being ignored, the process should continue to ignore them;
otherwise, they should be caught.
.PP
A more sophisticated program may wish to intercept
an interrupt and interpret it as a request
to stop what it is doing
and return to its own command-processing loop.
Think of a text editor:
interrupting a long printout should not cause it
to terminate and lose the work
already done.
The outline of the code for this case is probably best written like this:
.P1
#include <signal.h>
#include <setjmp.h>
jmp_buf	sjbuf;
.sp 0.5v
main(\|)
{
	int (\(**istat)(\|), onintr(\|);
.sp 0.5v
	istat = signal(SIGINT, SIG_IGN);	/\(** save original status \(**/
	setjmp(sjbuf);	/\(** save current stack position \(**/
	if (istat != SIG_IGN)
		signal(SIGINT, onintr);
.sp 0.5v
	/\(** main processing loop \(**/
}
.P2
.P1
onintr(\|)
{
	printf("\nInterrupt\n");
	longjmp(sjbuf);	/\(** return to saved state \(**/
}
.P2
The include file
.UL setjmp.h
declares the type
.UL jmp_buf
an object in which the state
can be saved.
.UL sjbuf
is such an object; it is an array of some sort.
The
.UL setjmp
routine then saves
the state of things.
When an interrupt occurs,
a call is forced to the
.UL onintr
routine,
which can print a message, set flags, or whatever.
.UL longjmp
takes as argument an object stored into by
.UL setjmp ,
and restores control
to the location after the call to
.UL setjmp ,
so control (and the stack level) will pop back
to the place in the main routine where
the signal is set up and the main loop entered.
Notice, by the way, that
the signal
gets set again after an interrupt occurs.
This is necessary; most signals are automatically
reset to their default action when they occur.
.PP
Some programs that want to detect signals simply can't be stopped
at an arbitrary point,
for example in the middle of updating a linked list.
If the routine called on occurrence of a signal
sets a flag and then
returns instead of calling
.UL exit
or
.UL longjmp ,
execution will continue
at the exact point it was interrupted.
The interrupt flag can then be tested later.
.PP
There is one difficulty associated with this
approach.
Suppose the program is reading the
terminal when the interrupt is sent.
The specified routine is duly called; it sets its flag
and returns.
If it were really true, as we said
above, that ``execution resumes at the exact point it was interrupted,''
the program would continue reading the terminal
until the user typed another line.
This behavior might well be confusing, since the user
might not know that the program is reading;
he presumably would prefer to have the signal take effect instantly.
The method chosen to resolve this difficulty
is to terminate the terminal read when execution
resumes after the signal, returning an error code
which indicates what happened.
.PP
Thus programs which catch and resume
execution after signals should be prepared for ``errors''
which are caused by interrupted
system calls.
(The ones to watch out for are reads from a terminal,
.UL wait ,
and
.UL pause .)
A program
whose
.UL onintr
program just sets
.UL intflag ,
resets the interrupt signal, and returns,
should usually include code like the following when it reads
the standard input:
.P1
if (getchar(\|) == EOF)
	if (intflag)
		/\(** EOF caused by interrupt \(**/
	else
		/\(** true end-of-file \(**/
.P2
.PP
A final subtlety to keep in mind becomes important
when signal-catching is combined with execution of other programs.
Suppose a program catches interrupts, and also includes
a method (like ``!'' in the editor)
whereby other programs can be executed.
Then the code should look something like this:
.P1
if (fork(\|) == 0)
	execl(\ .\|.\|.\ );
signal(SIGINT, SIG_IGN);	/\(** ignore interrupts \(**/
wait(&status);	/\(** until the child is done \(**/
signal(SIGINT, onintr);	/\(** restore interrupts \(**/
.P2
Why is this?
Again, it's not obvious but not really difficult.
Suppose the program you call catches its own interrupts.
If you interrupt the subprogram,
it will get the signal and return to its
main loop, and probably read your terminal.
But the calling program will also pop out of
its wait for the subprogram and read your terminal.
Having two processes reading
your terminal is very unfortunate,
since the system figuratively flips a coin to decide
who should get each line of input.
A simple way out is to have the parent program
ignore interrupts until the child is done.
This reasoning is reflected in the standard I/O library function
.UL system :
.P1
#include <signal.h>
.sp 0.5v
system(s)	/\(** run command string s \(**/
char \(**s;
{
	int status, pid, w;
	register int (\(**istat)(\|), (\(**qstat)(\|);
.sp 0.5v
	if ((pid = fork(\|)) == 0) {
		execl("/bin/sh", "sh", "-c", s, 0);
		_exit(127);
	}
	istat = signal(SIGINT, SIG_IGN);
	qstat = signal(SIGQUIT, SIG_IGN);
	while ((w = wait(&status)) != pid && w != -1)
		;
	if (w == -1)
		status = -1;
	signal(SIGINT, istat);
	signal(SIGQUIT, qstat);
	return(status);
}
.P2
.PP
As an aside on declarations,
the function
.UL signal
obviously has a rather strange second argument.
It is in fact a pointer to a function delivering an integer,
and this is also the type of the signal routine itself.
The two values
.UL SIG_IGN
and
.UL SIG_DFL
have the right type, but are chosen so they coincide with
no possible actual functions.
For the enthusiast, here is how they are defined for the PDP-11;
the definitions should be sufficiently ugly
and nonportable to encourage use of the include file.
.P1
#define	SIG_DFL	(int (\(**)(\|))0
#define	SIG_IGN	(int (\(**)(\|))1
.P2
.SH "References"
.LP
.IP [1]
K. L. Thompson and D. M. Ritchie,
.ul
The
.ul
U\s-1NIX\s+1
.ul
Programmer's Manual,
Bell Laboratories, 1978.
.IP [2]
B. W. Kernighan and D. M. Ritchie,
.ul
The C Programming Language,
Prentice-Hall, Inc., 1978.
.IP [3]
B. W. Kernighan,
.ul
U\s-1NIX\s+1 for Beginners\-Second Edition,
Bell Laboratories, 1978.
.sp 100
.R
.TL
.bd 1 3
\!.bd 1 3
\f1Appendix\-The Standard I/O Library\fP
.AU
.bd 1
\!.bd 1
D. M. Ritchie
.AI
.MH
.PP
The standard I/O library
was designed with the following goals in mind.
.IP 1.
It must be as efficient as possible, both in time and in space,
so that there will be no hesitation in using it
no matter how critical the application.
.IP 2.
It must be simple to use, and also free of the magic
numbers and mysterious calls
whose use mars the understandability and portability
of many programs using older packages.
.IP 3.
The interface provided should be applicable on all machines,
whether or not the programs which implement it are directly portable
to other systems,
or to machines other than the PDP-11 running a version of
.UC UNIX .
.SH "1. \|GENERAL USAGE"
.PP
Each program using the library must have the line
.P1
		#include <stdio.h>
.P2
which defines certain macros and variables.
The routines are in the normal C library,
so no special library argument is needed for loading.
All names in the include file intended only for internal use begin
with an underscore
.UL _
to reduce the possibility
of collision with a user name.
The names intended to be visible outside the package are
.IP \f3stdin\f1 10
The name of the standard input file
.IP \f3stdout\f1 10
The name of the standard output file
.IP \f3stderr\f1 10
The name of the standard error file
.IP \f3EOF\f1 10
is actually \-1, and is the value returned by
the read routines on end-of-file or error.
.IP \f3NULL\f1 10
is a notation for the null pointer, returned by
pointer-valued functions
to indicate an error
.IP \f3FILE\f1 10
expands to
.UL struct
.UL _iob
and is a useful
shorthand when declaring pointers
to streams.
.IP \f3BUFSIZ\f1 10
is a number (viz. 512)
of the size suitable for an I/O buffer supplied by the user.
See
.UL setbuf ,
below.
.IP \f3getc,\ getchar,\ putc,\ putchar,\ feof,\ ferror,\ f\&ileno\f1 10
.br
are defined as macros.
Their actions are described below;
they are mentioned here
to point out that it is not possible to
redeclare them
and that they are not actually functions;
thus, for example, they may not have breakpoints set on them.
.PP
The routines in this package
offer the convenience of automatic buffer allocation
and output flushing where appropriate.
The names
.UL stdin ,
.UL stdout ,
and
.UL stderr
are in effect constants and may not be assigned to.
.SH "2. \|CALLS"
.nr PD .4v
.LP
.uL FILE\ \(**fopen(filename,\ type)\ char\ \(**filename,\ \(**type;
.nr PD 0
.IP
.br
opens the file and, if needed, allocates a buffer for it.
.UL filename
is a character string specifying the name.
.UL type
is a character string (not a single character).
It may be
.UL \&"r" ,
.UL \&"w" ,
or
.UL \&"a"
to indicate
intent to read, write, or append.
The value returned is a file pointer.
If it is
.UL NULL
the attempt to open failed.
.if t .bp
.nr PD .4v
.LP
.uL FILE\ \(**freopen(filename,\ type,\ ioptr)\ char\ \(**filename,\ \(**type;\ FILE\ \(**ioptr;
.nr PD 0
.IP
.br
The stream named by
.UL ioptr
is closed, if necessary, and then reopened
as if by
.UL fopen .
If the attempt to open fails,
.UL NULL
is returned,
otherwise
.UL ioptr ,
which will now refer to the new file.
Often the reopened stream is
.UL stdin
or
.UL stdout .
.nr PD .4v
.LP
.uL int\ getc(ioptr)\ FILE\ \(**ioptr;
.nr PD 0
.IP
returns the next character from the stream named by
.UL ioptr ,
which is a pointer to a file such as returned by
.UL fopen ,
or the name
.UL stdin .
The integer
.UL EOF
is returned on end-of-file or when
an error occurs.
The null character
.UL \e0
is a legal character.
.nr PD .4v
.LP
.uL int\ fgetc(ioptr)\ FILE\ \(**ioptr;
.nr PD 0
.IP
.br
acts like
.UL getc
but is a genuine function,
not a macro,
so it can be pointed to, passed as an argument, etc.
.nr PD .4v
.LP
.uL putc(c,\ ioptr)\ FILE\ \(**ioptr;
.nr PD 0
.IP
.br
.UL putc
writes the character
.UL c
on the output stream named by
.UL ioptr ,
which is a value returned from
.UL fopen
or perhaps
.UL stdout
or
.UL stderr .
The character is returned as value,
but
.UL EOF
is returned on error.
.nr PD .4v
.LP
.uL fputc(c,\ ioptr)\ FILE\ \(**ioptr;
.nr PD 0
.IP
.br
acts like
.UL putc
but is a genuine
function, not a macro.
.nr PD .4v
.LP
.uL fclose(ioptr)\ FILE\ \(**ioptr;
.nr PD 0
.IP
.br
The file corresponding to
.UL ioptr
is closed after any buffers are emptied.
A buffer allocated by the I/O system is freed.
.UL fclose
is automatic on normal termination of the program.
.nr PD .4v
.LP
.uL fflush(ioptr)\ FILE\ \(**ioptr;
.nr PD 0
.IP
.br
Any buffered information on the (output) stream named by
.UL ioptr
is written out.
Output files are normally buffered
if and only if they are not directed to the terminal;
however,
.UL stderr
always starts off unbuffered and remains so unless
.UL setbuf
is used, or unless it is reopened.
.nr PD .4v
.LP
.uL exit(errcode);
.nr PD 0
.IP
.br
terminates the process and returns its argument as status
to the parent.
This is a special version of the routine
which calls
.UL fflush
for each output file.
To terminate without flushing,
use
.UL _exit .
.nr PD .4v
.LP
.uL feof(ioptr)\ FILE\ \(**ioptr;
.nr PD 0
.IP
.br
returns non-zero when end-of-file
has occurred on the specified input stream.
.nr PD .4v
.LP
.uL ferror(ioptr)\ FILE\ \(**ioptr;
.nr PD 0
.IP
.br
returns non-zero when an error has occurred while reading
or writing the named stream.
The error indication lasts until the file has been closed.
.nr PD .4v
.LP
.uL getchar(\|);
.nr PD 0
.IP
.br
is identical to
.UL getc(stdin) .
.nr PD .4v
.LP
.uL putchar(c);
.nr PD 0
.IP
.br
is identical to
.UL putc(c,\ stdout) .
.nr PD .4v
.nr PD .4v
.LP
.uL char\ \(**fgets(s,\ n,\ ioptr)\ char\ \(**s;\ FILE\ \(**ioptr;
.nr PD 0
.IP
.br
reads up to
.UL n-1
characters from the stream
.UL ioptr
into the character pointer
.UL s .
The read terminates with a new-line character.
The new-line character is placed in the buffer
followed by a null character.
.UL fgets
returns the first argument,
or
.UL NULL
if error or end-of-file occurred.
.nr PD .4v
.nr PD .4v
.LP
.uL fputs(s,\ ioptr)\ char\ \(**s;\ FILE\ \(**ioptr;
.nr PD 0
.IP
.br
writes the null-terminated string (character array)
.UL s
on the stream
.UL ioptr .
No new-line is appended.
No value is returned.
.if t .bp
.nr PD .4v
.LP
.uL ungetc(c,\ ioptr)\ FILE\ \(**ioptr;
.nr PD 0
.IP
.br
The argument character
.UL c
is pushed back on the input stream named by
.UL ioptr .
Only one character may be pushed back.
.nr PD .4v
.LP
.uL printf(format,\ a1,\ .\|.\|.\ )\ char\ \(**format;
.br
.uL fprintf(ioptr,\ format,\ a1,\ .\|.\|.\ )\ FILE\ \(**ioptr;\ char\ \(**format;
.br
.uL sprintf(s,\ format,\ a1,\ .\|.\|.\ )char\ \(**s,\ \(**format;
.br
.nr PD 0
.IP
.UL printf
writes on the standard output.
.UL fprintf
writes on the named output stream.
.UL sprintf
puts characters in the character array (string)
named by
.UL s .
The specifications are as described in section
.UL printf (3)
of the
.ul
.UC UNIX
.ul
Programmer's Manual.
.nr PD .4v
.LP
.uL scanf(format,\ a1,\ .\|.\|.\ )\ char\ \(**format;
.br
.uL fscanf(ioptr,\ format,\ a1,\ .\|.\|.\ )\ FILE\ \(**ioptr;\ char\ \(**format;
.br
.uL sscanf(s,\ format,\ a1,\ .\|.\|.\ )\ char\ \(**s,\ \(**format;
.nr PD 0
.IP
.br
.UL scanf
reads from the standard input.
.UL fscanf
reads from the named input stream.
.UL sscanf
reads from the character string
supplied as
.UL s .
.UL scanf
reads characters, interprets
them according to a format, and stores the results in its arguments.
Each routine expects as arguments
a control string
.UL format ,
and a set of arguments,
.I
each of which must be a pointer,
.R
indicating where the converted input should be stored.
.if t .sp .4v
.UL scanf
returns as its value the number of successfully matched and assigned input
items.
This can be used to decide how many input items were found.
On end of file,
.UL EOF
is returned; note that this is different
from 0, which means that the next input character does not
match what was called for in the control string.
.RE
.nr PD .4v
.LP
.uL fread(ptr,\ sizeof(\(**ptr),\ nitems,\ ioptr)\ FILE\ \(**ioptr;
.nr PD 0
.IP
.br
reads
.UL nitems
of data beginning at
.UL ptr
from file
.UL ioptr .
No advance notification
that binary I/O is being done is required;
when, for portability reasons,
it becomes required, it will be done
by adding an additional character to the mode-string on the
.UL fopen
call.
.nr PD .4v
.LP
.uL fwrite(ptr,\ sizeof(\(**ptr),\ nitems,\ ioptr)\ FILE\ \(**ioptr;
.nr PD 0
.IP
.br
Like
.UL fread ,
but in the other direction.
.nr PD .4v
.LP
.uL rewind(ioptr)\ FILE\ \(**ioptr;
.nr PD 0
.IP
.br
rewinds the stream
named by
.UL ioptr .
It is not very useful except on input,
since a rewound output file is still open only for output.
.nr PD .4v
.LP
.uL system(string)\ char\ \(**string;
.nr PD 0
.IP
.br
The
.UL string
is executed by the shell as if typed at the terminal.
.nr PD .4v
.LP
.uL getw(ioptr)\ FILE\ \(**ioptr;
.nr PD 0
.IP
.br
returns the next word from the input stream named by
.UL ioptr .
.UL EOF
is returned on end-of-file or error,
but since this a perfectly good
integer
.UL feof
and
.UL ferror
should be used.
A ``word'' is 16 bits on the
.UC PDP-11.
.nr PD .4v
.LP
.uL putw(w,\ ioptr)\ FILE\ \(**ioptr;
.nr PD 0
.IP
.br
writes the integer
.UL w
on the named output stream.
.nr PD .4v
.LP
.uL setbuf(ioptr,\ buf)\ FILE\ \(**ioptr;\ char\ \(**buf;
.nr PD 0
.IP
.br
.UL setbuf
may be used after a stream has been opened
but before I/O has started.
If
.UL buf
is
.UL NULL ,
the stream will be unbuffered.
Otherwise the buffer supplied will be used.
It must be a character array of sufficient size:
.P1
char	buf[BUFSIZ];
.P2
'\"	.nr PD .4v
.LP
.uL fileno(ioptr)\ FILE\ \(**ioptr;
.nr PD 0
.IP
.br
returns the integer file descriptor associated with the file.
.if t .bp
.nr PD .4v
.LP
.uL fseek(ioptr,\ offset,\ ptrname)\ FILE\ \(**ioptr;\ long\ offset;
.nr PD 0
.IP
.br
The location of the next byte in the stream
named by
.UL ioptr
is adjusted.
.UL offset
is a long integer.
If
.UL ptrname
is 0, the offset is measured from the beginning of the file;
if
.UL ptrname
is 1, the offset is measured from the current read or
write pointer;
if
.UL ptrname
is 2, the offset is measured from the end of the file.
The routine accounts properly for any buffering.
(When this routine is used on
.UC UNIX \& non-
systems,
the offset must be a value returned from
.UL ftell
and the ptrname must be 0).
.nr PD .4v
.LP
.uL long\ ftell(ioptr)\ FILE\ \(**ioptr;
.nr PD 0
.IP
.br
The byte offset, measured from the beginning of the file,
associated with the named stream is returned.
Any buffering is properly accounted for.
(On
.UC UNIX \& non-
systems the value of this call is useful only
for handing to
.UL fseek ,
so as to position the file to the same place it was when
.UL ftell
was called.)
.nr PD .4v
.LP
.uL getpw(uid,\ buf)\ char\ \(**buf;
.nr PD 0
.IP
.br
The password file is searched for the given integer user ID.
If an appropriate line is found, it is copied into
the character array
.UL buf ,
and 0 is returned.
If no line is found corresponding to the user ID
then 1 is returned.
.nr PD .4v
.LP
.uL char\ \(**malloc(num);
.nr PD 0
.IP
.br
allocates
.UL num
bytes.
The pointer returned is sufficiently well aligned to be usable for any purpose.
.UL NULL
is returned if no space is available.
.nr PD .4v
.LP
.uL char\ \(**calloc(num,\ size);
.nr PD 0
.IP
.br
allocates space for
.UL num
items each of size
.UL size .
The space is guaranteed to be set to 0 and the pointer is
sufficiently well aligned to be usable for any purpose.
.UL NULL
is returned if no space is available .
.nr PD .4v
.LP
.uL cfree(ptr)\ char\ \(**ptr;
.nr PD 0
.IP
.br
Space is returned to the pool used by
.UL calloc .
Disorder can be expected if the pointer was not obtained
from
.UL calloc .
.nr PD .4v
.LP
The following are macros whose definitions may be obtained by including
.UL <ctype.h> .
.nr PD .4v
.LP
.UL isalpha(c)
returns non-zero if the argument is alphabetic.
.nr PD .4v
.LP
.UL isupper(c)
returns non-zero if the argument is upper-case alphabetic.
.nr PD .4v
.LP
.UL islower(c)
returns non-zero if the argument is lower-case alphabetic.
.nr PD .4v
.LP
.UL isdigit(c)
returns non-zero if the argument is a digit.
.nr PD .4v
.LP
.UL isspace(c)
returns non-zero if the argument is a spacing character:
tab, new-line, carriage return, vertical tab,
form feed, space.
.nr PD .4v
.LP
.UL ispunct(c)
returns non-zero if the argument is
any punctuation character, i.e., not a space, letter,
digit or control character.
.nr PD .4v
.LP
.UL isalnum(c)
returns non-zero if the argument is a letter or a digit.
.nr PD .4v
.LP
.UL isprint(c)
returns non-zero if the argument is printable-a letter,
digit, or punctuation character.
.nr PD .4v
.LP
.UL iscntrl(c)
returns non-zero if the argument is a control character.
.nr PD .4v
.LP
.UL isascii(c)
returns non-zero if the argument is an
.UC ASCII
character, i.e., less than octal 0200.
.nr PD .4v
.LP
.UL toupper(c)
returns the upper-case character corresponding to the lower-case
letter
.UL c.
.nr PD .4v
.LP
.UL tolower(c)
returns the lower-case character corresponding to the upper-case
letter
.UL c .
.sp
.I "June 1980"