Mini-Unix/usr/doc/iolib/iolib

.ds s \\s8
.ds S \\s0
.ds * \v'.2m'*\v'-.2m'
.tr ~.
.ds . \s14~\s0
.tr _\(ul
.de sn
.sp
.ft I
.ne 2
..
.de sN
.sp .5
.ft R
..
.TL
The Portable C Library (on \s-2UNIX\s0) *
.AU
M. E. Lesk
.AI
.MH
.SH
1. INTRODUCTION
.PP
The C language [1] now exists on three operating systems.
.FS
* This document is an abbreviated form of
``The Portable C Library'', by M. E. Lesk, describing only
the UNIX section of the library.
.FE
A set of library routines common to
\*sPDP\*S 11 \*sUNIX\*S, Honeywell 6000 \*sGCOS\*S,
and \*sIBM\*S 370 \*sOS\*S
has been provided to improve program portability.
This memorandum describes the UNIX implementation of
the portable routines.
.PP
The programs
defined here were chosen to follow the standard routines
available on \*sUNIX\*S, with alterations to improve transferability
to other computer systems.  It is expected that future C implementations
will try to support the basic library outlined in this document.
It provides character stream input and output on multiple files;
simple accessing of files by name; and some elementary
formatting and translating routines.
The remainder of this memorandum lists the portable and
non-portable library routines and explains some of the programming
aids available.
.PP
The I/O routines in the C library
fall into several classes.
Files are addressed through intermediate numbers called
.I
file-descriptors
.R
which are described in section 2.  Several default file-descriptors
are provided by the system; other aspects of the system
environment are explained in section 3.
.PP
Basic character-stream input and output involves the reading or writing of
files considered as streams of characters.
The C library
includes facilities for this, discussed
in section 4.
Higher-level character stream operations permit translation of
internal binary representations of numbers to and from
character representations, and formatting or unpacking of character
data.
These operations are performed with the subprograms in section 5.
Binary input and output routines permit data transmission
without the cost of translation to or from readable \*sASCII\*S character representations.
Such data transmission
should only be directed to files or tapes, and not to
printers or terminals.  As is usual with such routines, the
only simple guarantee that can be made to the programmer seeking
portability is that data written by a particular sequence of
binary writes, if read by the exactly matching sequence of binary reads,
will restore the previous contents of memory.
Other reads or writes have system-dependent effects.
See section 6 for a discussion of binary input and output.
Section 7 describes some further routines in the portable library.
These include a storage allocator and
some other control and conversion functions.
.SH
2. FILE DESCRIPTORS
.PP
Except for the standard input and output files, all files must
be explicitly opened before
any I/O is performed on them.
When files are opened for writing, they are created if not already present.
They must be closed when finished, although the normal
.I
cexit
.R
routine will take care of that.
When opened a disc file or device is associated with a file descriptor, an integer
between 0 and 9.
This file descriptor is used for further I/O to the file.
.PP
Initially you are given three file descriptors by the system: 0, 1, and 2.
File 0 is the standard input; it is normally the teletype
in time-sharing or input data cards in batch.
File 1 is the standard output; it is normally the
teletype in time-sharing or the line printer in batch.
File 2 is the error file; it is an output file, normally the same
as file 1, except that when file 1 is diverted via a command
line '>' operator, file 2 remains attached to the
original destination, usually the terminal.
It is used for error message output.
These popular \*sUNIX\*S conventions are considered part
of the C library specification.
By closing 0 or 1, the default input or output may be re-directed;
this can also be done on the command line by
.ft I
>file
.ft P
for output or
.ft I
<file
.ft P
for input.
.PP
Associated with the portable library
are two external integers,
named
.I
cin
.R
and
.I
cout.
.R
These are respectively the numbers of the
standard input unit and standard output unit.
Initially 0 and 1 are used, but you may redefine
them at any time.
These cells are used by the routines
.I
getchar,
putchar,
gets,
.R
and
.I
puts
.R
to select their I-O unit number.
.SH
3. THE C ENVIRONMENT
.PP
The C language is almost exactly the same on all machines,
except for essential machine differences such as word
length and number of characters per word. 
On \*sUNIX\*S
\*sASCII\*S character code is used.
Characters range from \(mi128 to +127 in numeric value,
there is sign extension when characters are assigned to integers,
and right shifts are arithmetic.
The ``first'' character in a word is stored in the right half word.
.PP
More serious problems of compatibility are caused by the loaders
on the different operating systems.
.PP
\*sUNIX\*S permits external names to be in
upper and lower case,
up to seven characters long.
There may be multiple external definitions (uninitialized) of
the same name.
.PP
The C alphabet for identifier names includes
the upper and lower case letters, the digits, and the underline.
It is not possible for C programs to communicate with
\*sFORTRAN\*S programs.
.SH
4. BASIC CHARACTER STREAM ROUTINES
.PP
These routines transfer streams of characters in and out of
C programs.  Interpretation of the characters is left to
the user.
Facilities for interpreting numerical strings are presented in section
5; and routines to transfer binary data to and from files or devices
are discussed in section 6.
In the following routine descriptions, the optional
argument
.ft I
fd
.ft P
represents a file-descriptor; if not present, it is taken
to be 0 for input and 1 for output.  When your program
starts, remember that these are associated with the ``standard''
input and output files.
.sn
COPEN (filename, type)
.sN
\fICopen\fR initiates activity on a file; if necessary it will
create the file too.
Up to 10 files may be open at one time.
When called as described here,
.ft I
copen
.ft R
returns a filedescriptor for a character stream file.
Values less than zero returned by
.I
copen
.R
indicate an error trying to open the file.
Other calls to
.ft I
copen
.ft R
are described in sections 6 and 7.
.sp 3p
Arguments :
.sp 3p
.ft I
Filename:
.ft P
a string representing a file name, according to the
local operating system conventions.
All accept a string of letters and digits as a legal file name,
although leading digits are not recommended on \*sGCOS\*S.
.br
.ft I
Type:
.ft R
a character `r', `w', or `a' meaning read, write, or append.
Note that the type is a single character, whereas the
file name must be a string.
.sn
CGETC ( fd )
.sN
.ft I
Cgetc
.ft R
returns the next character from the input unit associated with \fIfd\fR. 
On end of file \fIcgetc\fR returns `\\0'.
To signal end of file from the teletype,
type the special symbol appropriate to \*sUNIX\*S:
EOT (control-D)
.sn
CPUTC (ch , fd )
.sN
.ft I
Cputc
.ft R
writes a character onto the given output unit.
.ft I
Cputc
.ft R
returns as its value the character written.
.PP
Output for disk files is buffered
in 512 character units, irrespective
of newlines;
teletype output goes character by character
.sn
CCLOSE (fd)
.sN
Activity on file
.ft I
fd
.ft R
is terminated and
any output buffers are emptied.
You usually don't have to call
.ft I
cclose; cexit
.ft R
will
do it for you on all open files.
However, to write some data
on a file and then read it back in, the correct sequence is:
.DS
fd = copen (``file'', `w');
write on fd ...
cclose (fd);
fd = copen(``file'', `r');
read from fd ...
.DE
.sn
CFLUSH (fn)
.sN
To get buffer flushing, but retain the ability to write more
on the file, you may call this routine.
.PP
Normally, output intended
for the teletype is not buffered
and this call is not needed.
.sn
CEXIT ([errcode])
.sN
.ft I
Cexit
.ft R
closes all files and then terminates execution.
If a non-zero argument is given, this is assumed to be
an error indication or other returned value to be signalled
to the operating system.
.PP
.I
Cexit
.R
.B
must
.R
be called explicitly; a return
from the main program is not
adequate.
.sn
CEOF (fd)
.sN
.ft I
Ceof
.ft R
returns nonzero when end of file has been reached on input unit \fIfd.\fR
.sn
GETCHAR ()
.sN
.ft I
Getchar
.ft R
is a special case of \fIcgetc;\fR it reads
one character from the standard input unit.
.ft I
Getchar (\|)
.ft R
is defined as
.ft I
cgetc (cin);
.ft R
it should not have an argument.
.sn
PUTCHAR (ch)
.sN
.ft I
Putchar (ch)
.ft R
is the same as
.ft I
cputc (ch, cout);
.ft R
it writes one character on the standard output.
.sn
GETS (s)
.sN
.ft I
Gets
.ft R
reads everything up to the next newline into the
string pointed to by \fIs.\fR  If the last character read
from this input unit was newline, then \fIgets\fR reads
the next line, which on \*sGCOS\*S and \*sIBM\*S corresponds exactly
to a logical record.
The terminating newline is replaced by `\\0'.
The value of \fIgets\fR is \fIs,\fR or 0 if end of file.
.sn
PUTS (s)
.sN
Copies the string \fIs\fR onto the standard output unit. 
The terminating `\\0' is replaced by a newline character.
The value of
.I
puts
.R
is
.I
s.
.R
.sn
UNGETC (ch , fd)
.sN
.ft I
Ungetc
.ft R
pushes back its character argument to the unit
.ft I
fd,
.ft R
which must be open for input.
After
.ft I
ungetc (`a', fd); ungetc (`b', fd);
.ft R
the next two characters
to be read from \fIfd\fR will be `b' and then `a'.
Up to 100 characters may be pushed back on each file.
This subroutine permits a program to read past the end
of its input, and then restore it for the next routine to read.
It is impossible to change an external file with
.ft I
ungetc;
.ft R
its purpose is only for internal communications, most particularly
.ft I
scanf,
.ft R
which is described in section 5.
Note that
.ft I
scanf
.ft R
actually requires only one character
of ``unget'' capability; thus it is possible that
future implementors may change the specification of the
.ft I
ungetc
.ft R
routine.
.SH
5. HIGH-LEVEL CHARACTER STREAM ROUTINES
.PP
These two routines,
.ft I
printf
.ft R
for output and
.ft I
scanf
.ft R
for input, permit simple translation to and from character
representations of numerical quantities.
They also allow generation or interpretation of formatted
lines.
.sn
PRINTF (\|[fd, ] control-string, arg1, arg2, ...)
.sp .5
PRINTF (\|[\(mi1, output-string, ] control-string, arg1, arg2, ...)
.sN
.ft I
Printf
.ft R
converts, formats, and prints its arguments under control of the
control string.
The control string contains two types of objects: plain characters,
which are simply copied to the output stream, and conversion
specifications, each of which causes conversion and printing of the
next successive argument to \fIprintf\fR.
.sp
Each conversion specification is introduced by the character `%'.
Following the `%', there may be:
.in 5
\(em an optional minus sign `\(mi' which specifies left adjustment of the
converted argument in the indicated field;
.sp 3p
\(em an optional digit string specifying a minimum field width; if the converted
argument has fewer characters than the field width it will be padded
on the left (or right, if the left
adjustment indicator has been given) to make up the field width;
the padding character is blank normally and zero if the field
width was specified with a leading zero
(note that this does not imply an octal field width);
.sp 3p
\(em an optional period `.' which serves to separate the field width
from the next digit string;
.sp 3p
\(em an optional digit string (the precision) which specifies the maximum
number of characters to be printed from a string,
or the number of digits to be printed to the right
of the decimal point of a floating or double number.
.sp 3p
\(em an optional length modifier \(fml\(fm which indicates that the corresponding
data item is a
.I
long
.R
rather than an
.I
int.
.R
.sp 3p
\(em a character which indicates the type of conversion to be applied.
.in 0
.sp
The conversion characters and their meanings are:
.in 5
.sp 5p
.ti -0.2i
\fBd\fR	The argument is converted to decimal notation.
.sp 3p
.ti -0.2i
\fBo\fR	The argument is converted to octal notation.
.sp 3p
.ti -0.2i
\fBx\fR	The argument is converted to hexadecimal notation.
.sp 3p
.ti -0.2i
\fBu\fR	The argument is converted to unsigned decimal notation.
This is only implemented (or useful) on \*sUNIX\*S.
.sp 3p
.ti -0.2i
\fBc\fR	The argument is taken to be a single character.
.sp 3p
.ti -0.2i
\fBs\fR	The argument is taken to be a string and characters from the
string are printed until a null character is reached or until the
number of characters indicated by the precision specification
is exhausted.
.sp 3p
.ti -0.2i
\fBe\fR	The argument is taken to be a float or double
and converted to decimal notation of the form
.ft I
[\(mi]\|m.nnnnnnE\|[\(mi]\|xx
.ft R
where the length of the string of \fIn\fP's is specified by
the precision.
The default precision is 6 and the maximum is 22.
.sp 3p
.ti -0.2i
\fBf\fR	The argument is taken to be a float or double and
converted to decimal notation of the
form
.ft I
[\(mi]mmm.nnnnn
.ft R
where the length of the string of
\fIn\fR's is specified by
the precision.  The default precision is 6 and the maximum
is 22.
Note that the precision does not determine
the number of significant digits printed in \fBf\fR format.
.in 0
.sp 3p
If no recognizable conversion character appears after the `%', that character
is printed; thus `%' may be printed by use of the string ``%%''.
.PP
As an example of \fIprintf,\fR the following program fragment
.DS
.I
int i, j; float x; char \**s;
i = 35; j=2; x= 1.732; s = ``ritchie'';
printf (``%d %f %s\\n'', i, x, s);
printf (``%o, %4d or %\(mi4d%5.5s\\n'', i, j, j, s);
.R
.DE
would print
.DS
.I
.cs I 24
35 1.732000 ritchie
043,    2 or 2   ritch
.cs I
.R
.DE
.PP
If \fIfd\fR is not specified, output is to unit
.I
cout.
.R
It is possible to direct output to a string instead of to a file.
This is indicated by \(mi1 as the first argument.
The second argument should be a pointer to the string.
\fIPrintf\fR will put a terminating `\\0' onto the string.
.sn
SCANF (\|[fd, ] control-string, arg1, arg2, ....)
.sp .5
SCANF (\|[\(mi1, input-string, ] control-string, arg1, arg2, ....)
.sN
.ft I
Scanf
.ft R
reads characters, interprets
them according to a format, and stores the results in its arguments.
It expects as arguments:
.ti 5
.ti -0.2i
1.	An optional file-descriptor or input-string, indicating the source of the input
characters; if omitted, file
.I
cin
.R
is used;
.ti -0.2i
2.	A control string, described below;
.ti -0.2i
3.	A set of arguments,
.ft I
each of which must be a pointer,
.ft R
indicating where the converted input should be stored.
.in 0
.sp 5p
The
control string
usually contains
conversion specifications, which are used to direct interpretation
of input sequences.
The control string may contain:
.sp 5p
.in 5
.ti -0.2i
1.	Blanks, tabs or newlines, which are ignored.
.ti -0.2i
2.	Ordinary characters (not %) which are expected to match
the next non-space character of the input stream
(where space characters are defined as blank, tab or newline).
.ti -0.2i
3.	Conversion specifications, consisting of the
character %, an optional assignment suppressing character \**,
an optional numerical maximum field width, and a conversion
character.
.in 0
.sp 5p
A conversion specification is used to direct the conversion of the
next input field; the result
is placed in the variable pointed to by the corresponding argument,
unless assignment suppression was
indicated by the \** character.
An input field is defined as a string of non-space characters;
it extends either to the next space character or until the field
width, if specified, is exhausted.
.PP
The conversion character indicates the interpretation of the
input field; the corresponding pointer argument must
usually be of a restricted type.
Pointers, rather than variable names, are required
by the ``call-by-value'' semantics of the C language.
The following conversion characters are legal:
.in 5
.ti -0.2i
\fB%\fR	indicates that a single % character is expected
in the input stream at this point;
no assignment is done.
.tr @ 
.ti -0.2i
\fBd\fR	indicates that a decimal integer is expected
in the input stream;
the corresponding argument should be an integer pointer.
.ti -0.2i
\fBo\fR	indicates that an octal integer is expected in the
input stream; the corresponding argument should be a integer pointer.
.ti -0.2i
\fBx\fR	indicates that a hexadecimal integer is expected in the input stream;
the corresponding argument should be an integer pointer.
.ti -0.2i
\fBs\fR	indicates that a character string is expected;
the corresponding argument should be a character pointer
pointing to an array of characters large enough to accept the
string and a terminating `\\0', which will be added.
The input field is terminated by a space character
or a newline.
.ti -0.2i
\fBc\fR	indicates that a single character is expected; the
corresponding argument should be a character pointer;
the next input character is placed at the indicated spot.
The normal skip over space characters is suppressed
in this case;
to read the next non-space character, try
.ft I
%1s.
.ft R
.ti -0.2i
\fBe\fR or \fBf\fR indicates that a floating point number is expected in the input stream;
the next field is converted accordingly and stored through the
corresponding argument, which should be a pointer to a float.
The input format for
.I
floats
.R
is
a string of numbers
possibly containing a decimal point, followed by an optional
exponent field containing an E or e followed by a possibly signed integer.
.ti -0.2i
\fB[\fR	indicates a string not to be delimited by space characters.
The left bracket is followed by a set of characters and a right
bracket; the characters between the brackets define a set
of characters making up the string.
If the first character
is not circumflex (\|^\|), the input field
is all characters until the first character not in the set between
the brackets; if the first character
after the left bracket is ^, the input field is all characters
until the first character which is in the remaining set of characters
between the brackets.
The corresponding argument must point to a character array.
.in 0
.PP
The conversion characters
.I
d, o
.R
and
.I
x
.R
may be preceded by 
.I
l
.R
to indicate that a pointer to
.I
long
.R
rather than
.I
int
.R
is expected.
Similarly, the conversion characters
.I
e
.R
or
.I
f
.R
may be preceded by
.I
l
.R
to indicate that a pointer to 
.I
double
.R
rather than 
.I
float
.R
is in the argument list.
The character
.I
h
.R
will function similarly in the future to indicate
.I
short
.R
data items.
.PP
For example, the call
.in 10
.nf
.ft I
int i; float x; char name[50];
scanf ( ``%d%f%s'', &i, &x, name);
.ft R
.in 0
with the input line
.in 10
.ft I
25   54.32E\(mi1  thompson
.ft R
.in 0
.fi
will assign to
.ft I
i
.ft R
the value
25,
.ft I
x
.ft R
the value 5.432, and
.ft I
name
.ft R
will contain
.ft I
``thompson\\0''.
.ft R
Or,
.ft I
.nf
.in 10
int i; float x; char name[50];
scanf (``%2d%f%\**d%[1234567890]'', &i, &x, name);
.ft R
.in 0
with input
.ft I
.in 10
56789 0123 56a72
.ft R
.in 0
.fi
will assign 56 to
.ft I
i,
.ft R
789.0 to
.ft I
x,
.ft R
skip ``0123'',
and place the string ``56\\0'' in
.ft I
name.
.ft R
The next call to \fIcgetc\fR will return `a'.
.PP
.ft I
Scanf
.ft R
returns as its value the number of successfully matched and assigned input
items.
This can be used to decide how many input items were found.
On end of file, \(mi1 is returned; note that this is different
from 0, which means that the next input character does not
match what you called for in the control string.
.ft I
Scanf,
.ft R
if given a first argument of \(mi1, will scan a string in memory
given as the second argument.  For example,
if you want to read up to four numbers from an input line
and find out how many there were, you could try
.ft I
.in 5
.nf
.ne 3
int a[4], amax;
char line[100];
amax = scanf (\(mi1, gets(line), ``%d%d%d%d'', &a[0], &a[1], &a[2], &a[3]);
.in 0
.ft R
.fi
.SH
6. BINARY STREAM ROUTINES
.PP
These routines write binary data, not translated to printable characters.
They are normally efficient
but do not produce files that can be printed or easily interpreted.
No special information is added
to the records and thus they can be handled by
other programming systems
.ft I
if
.ft R
you make the departure from portability required
to tell the other system how big a C item (integer, float,
structure, etc.) really is in machine units.
.sn
COPEN (name, direction, ``i'')
.sN
When
.ft I
copen
.ft R
is called with a third argument as above, a binary stream
filedescriptor is returned.
Such a file descriptor is required for the remaining
subroutines in this section, and may not be used with
the routines in the preceding two sections.
The first two arguments operate exactly as described in section
3;
further details are given in section 7.
An ordinary file descriptor may be used for binary I-O,
but binary and character I-O may not be mixed
unless
.I
cflush
.R
is called at each switch to binary I-O.
The third
argument to
.I
copen
.R
is ignored.
.sn
CWRITE (\|ptr, sizeof(\**ptr), nitems, fd\|)
.sN
.ft I
Cwrite
.ft R
writes
.ft I
nitems
.ft R
of data beginning at
.ft I
ptr
.ft R
on file
.ft I
fd.
Cwrite
.ft R
writes blocks of binary information, not translated to
printable form, on a file.  It is intended for machine-oriented
bulk storage of intermediate data.  Any kind of data may
be written with this command, but only the corresponding
.ft I
cread
.ft R
should be expected to make any sense of it on return.
The first argument is a pointer to the beginning of a vector
of any kind of data.  The second argument
tells
.ft I
cwrite
.ft R
how big the items are.
The third argument specifies the number
of the items to be written; the fourth indicates where.
.sn
CREAD (\|ptr, sizeof(\**ptr), nitems, fd\|)
.sN
.ft I
Cread
.ft R
reads up to
.ft I
nitems
.ft R
of data from file
.ft I
fd
.ft R
into a buffer beginning at
.ft I
ptr.
Cread
.ft R
returns the number of items read.
.br
.ne 2i
The returned
number of items will be equal to the number requested by
.I
nitems
.R
except for reading certain devices (e.g. the teletype or magnetic tape)
or reading the final bytes of a disk file.
.LP
Again, the second argument indicates the size
of the data items being read.
.sn
CCLOSE (fd)
.sN
The same description applies as for
character-stream files.
.SH
7. OTHER PORTABLE ROUTINES
.LP
.sn
REW (fd)
.sN
Rewinds unit
.I
fd.
.R
Buffers are emptied properly and the file
is left open.
.sn
SYSTEM (string)
.sN
The given
.I
string
.R
is executed as if it were typed at the terminal.
.sn
NARGS (\|)
.sN
A subroutine can call
this function to try to find out how many arguments it was called with.
Normally,
.I
nargs()
.R
returns the number of arguments
plus 3 for every
.I
float
.R
or
.I
double
.R
argument
and plus one for every
.I
long
.R
argument.
If the new \*sUNIX\*S feature of separated instruction and data space
areas is used,
.I
nargs()
.R
doesn't work at all.
.sn
CALLOC (n, sizeof(object))
.sN
.I
Calloc
.R
returns a pointer to new storage, allocated in space
obtained from the operating system.
The space obtained is
well enough aligned for any use, i.e.
for a double-precision number.
Enough space to store
.I
n
.R
objects of the size
indicated by the second argument
is provided.
The
.I
sizeof
.R
is executed at compile time; it is not in the library.
A returned value of \(mi1 indicates failure to obtain space.
.sn
CFREE (ptr, n, sizeof(*ptr))
.sN
.I
Cfree
.R
returns to the operating system
memory starting at
.I
ptr
.R
and extending for
.I
n
.R
units of the size given by the third argument.
The space should have been obtained through \fIcalloc\fR.
On \*sUNIX\*S you can
only return the exact amount of space
obtained by 
.I
calloc;
.R
the second and third arguments are ignored.
.sn
FTOA (floating-number, char-string, precision, format\|)
.sN
.ft I
Ftoa
.ft R
(floating to \*sASCII\*S conversion)
converts floating point numbers to character strings.
The
.ft I
format
.ft R
argument should be either `f' or `e'; `e' is default.
See the explanation of
.ft I
printf
.ft R
in section 5 for a description of the result.
.sn
ATOF (char-string)
.sN
Returns a floating value equal to the
value of the \*sASCII\*S character string argument,
interpreted as a decimal floating point number.
.sn
TMPNAM (str)
.sN
This routine places in the character array expected as its argument
a string
which is legal to use as a file name
and which is guaranteed to be unique
among all jobs executing on the computer at the same
time.  It is thus appropriate for use as a temporary
file name, although
the user may wish to move it into an appropriate directory.
The value of the function is the address
of the string.
.sn
ABORT (code)
.sN
Causes your program to terminate abnormally, which typically
results in a dump by the operating system.
.sn
INTSS ()
.sN
This routine tells you whether you are running in
foreground or background.
The definition of ``foreground'' is that
the standard input is the terminal.
.sn
WDLENG (\|)
.sN
This returns 16 on \*sUNIX\*S.
C users should be aware that the preprocessor
normally provides a defined symbol suitable for distinguishing the local system;
thus on
.SM
UNIX
.NL
the symbol
.I
unix
.R
is defined
before starting to compile your program.