SysIII/usr/src/man/docs/c_env

.ds :? C Environment of UNIX/TS
.PH "''''"
.OH "'\s9\f2\*(:?\fP''\\\\nP\s0'"
.EH "'\s9\\\\nP''\f2\*(:?\^\fP\s0'"
'\"	nothing much fixed!
.hy 14
.ds q \s-1UNIX/TS\s+1
.ds m \f2U\s-1NIX/TS\s+1 User's Manual\^\fP
.tr ~
.nr Hb 3
.nr Hs 3
.nr Hu 4
.ds HF 3 3 2 3 2
.bd S B 3
.de Ds
.DS 1
.br
.lg 0
\!.lg 0
.ss 20
\!.ss 20
.br
..
.de De
.br
\!.ss 12
.ss 12
.lg
\!.lg
.br
.DE
..
.de I
.nr ;F \\n(.f\"save current font
.ft 2
.if \\n(.$ .if !\\n(.$-1 \&\\$1
.if \\n(.$-1 \&\\$1\^\c
.if \\n(.$ .ft\\n(;F\"back to saved font
.if \\n(.$-1 \&\\$2
..
.TL
The C Environment of U\s-2NIX/TS\s+2
.AU "Andrew R. Koenig" ARK MH
.MT 4
.H 1 "INTRODUCTION"
This document describes
differences users may encounter
when changing to
\s-1UNIX\s+1\(dg\s-1/TS\s+1
.FS \(dg
UNIX is a Trademark of Bell Laboratories.
.FE
from the various so-called ``UNIX Sixth Edition'' C compilers.
The document is intended as a conversion aid,
so the emphasis is on incompatibilities,
rather than new facilities.
.P
Note that this document is only a
guide;
refer to the \*m
for complete information.
.PP
This version of this document supersedes all previous versions thereof.
.H 1 "LIBRARY CHANGES"
The changes that are most likely to be noticed are in the
run-time library.
The ``Standard I/O Library'' has been incorporated into
\f2/lib/libc\^\fP\f3.\fP\f2a\^\fP
along with the contents of
\f2/lib/liba\^\fP\f3.\fP\f2a\^\fP;
this latter library is gone.
.I Printf\^
has been rewritten into portable C; there
are a few incompatibilities with the old version.
Finally, there are a number of smaller changes and
incompatibilities.
.H 2 "Environments"
The system now makes available to the user program a table of
.I "environment variables" .
Each variable has a name and a value; both name and value are
character strings.
The values of environment variables are preserved across
.I fork\^
and
.I exec ;
they can also be altered easily using the shell and
somewhat less easily using the new
.I execle\^
and
.I execve\^
system calls.
.P
The new
.I getenv\^
function can be used to retrieve the value of an environment variable.
.H 2 "The Standard I/O Library"
In the past, there were two I/O libraries available.
One was documented by
.I "A New Input-Output Package\^"
(Ritchie),
and was made available
through the
.B \-lS
loader option.
The other, older one was made available whenever a
C program was being compiled; it was
characterized, among other things, by use of the names
.I fin\^
and
.I fout\^
to control disposition of standard input and output files.
.P
The older library has now vanished, along with the
.B \-lS
option.
All programs will receive the new I/O library without any
explicit action.
In addition, the libraries obtained by
.B \-lc
and
.B \-la
have been merged; this combined library is accessed (where needed) by
.B \-lc
or
.B \-l .
.H 2 "Printf"
In the interests of portability,
.I printf\^
has been rewritten into portable C.
This results in load modules some 1800 bytes larger
than previous versions.
.P
The correct way to write a long integer is now
.B %ld
or
.B %lo ;
the previous forms
.B D
and
.B O
are going away.
The purpose of this is to permit
.B X ,
.B E ,
and
.B G
format codes for indicating that the letters produced by
the format code are to appear in upper case.
.P
The
.B %r
format code has been removed; if you don't know what it was,
you don't want to know.
.H 2 "Scanf"
.H 3 "White space"
The way in which
.I scanf\^
treats white space has changed slightly.
No longer is it the case that
.I scanf\^
will skip white space in the input for each character
in the format.
Rather, a space, tab, or new-line in the format will match
optional white space in the input.
Thus:
.Ds
"alpha = %d"
.De
will match any of
.Ds
alpha=12
alpha  =12
alpha=  12
alpha  =  12
.De
but not
.Ds
a  lpha=12
.De
as was formerly the case.
Note that this change may require white space to be inserted in
format strings of formerly working programs to maintain
compatibility.
.H 3 "Character class formats"
A character class format item (such as "%[0123456789]") is now permitted
to match a null string.
Thus,
.Ds
scanf (":%[^:]:", x);
.De
will no longer fail when presented with
.Ds
::
.De
.H 2 "Mathematical Routines"
The mathematical subroutines have been moved to a separate
library obtainable by the
.B \-lm
option.
Declarations for these routines can be obtained in
.B <math.h> .
.H 2 "Character class routines"
The routines that test character class
(\f2isdigit\^\fP,
etc.)
are no longer defined in
.B <stdio.h> ;
rather, they are defined in
.B <ctype.h> .
Thus, a line of the form
.Ds
#include <ctype.h>
.De
will have to be added to those programs which use
.I isdigit ,
.I isupper ,
.I islower ,
and their relatives.
.PP
The domain of the character class routines has been extended
to match the range of
.I getc :
\-1 through 255.
.PP
The character class routine
.I isprint\^
has been revised to conform to its documentation;
a space is now considered a printable character.
To determine if a character has a graphic representation,
use the (new) function
.I isgraph .
.H 2 "Character Conversion Routines"
The routines
.I toupper\^
and
.I tolower\^
have had their domain extended to the range of
.I getc :
.I toupper\^
will return its argument unchanged if that argument is not
a lower-case letter, and
.I toupper\^
will return its argument if it is not an upper-case letter.
This change required rewriting
.I toupper\^
and
.I tolower\^
as true subroutines, rather than macros;
for applications where efficiency is paramount
and the argument is already known to be a letter of the appropriate
case, the original macros have been renamed
.I _toupper\^
and
.I _tolower .
.H 2 "Error Recovery"
The library now incorporates two new routines,
.I ssignal\^
and
.I gsignal .
In the future, these routines will be used by other routines in the
library to cause automatic program termination on detection of
various common errors, with the possibility of finer control as a
user option.
.P
This description is deliberately vague, as the facility is still
in the planning stage.
.H 2 "Time of Day"
There is a new function
.I tzset .
It is called with no arguments, and looks for an environment variable
.B TZ .
This variable is expected to be in the form
\f3EST\fP\f2n\^\fP
or
\f3EST\fP\f2n\^\fP\f3EDT\fP,
where
.I n\^
is a string of digits with an optional negative sign
and represents the difference between the local time zone and GMT,
surrounded by the names of the local and (optional) daylight
time zones.
If
.I tzset\^
finds an environment variable
.B TZ
in this form, it sets the time zone parameters
.I timezone ,
.I tzname ,
and
.I daylight\^
appropriately.
.I Tzset\^
is now called automatically by
.I asctime ,
so it usually need not be called by the user.
.P
Note also that the variable
.I timezone\^
is now a
.B long ,
so programs referencing it will have to be changed slightly.
.H 2 "Miscellaneous"
.H 3 "chown"
.I Chown\^
now takes three arguments: the file name, the new owner, and
the new group.
This is necessary because owner and group can now each
be up to 16 bits.
.H 3 "tell"
.I Tell\^
is gone;
.I lseek\^
instead returns a value indicating the location sought.
.H 3 "setexit and reset"
.I Setexit\^
and
.I reset\^
are gone; their function is taken over by
.I setjmp\^
and
.I longjmp .
These new routines provide all the facilities of
.I setexit\^
and
.I reset\^
in a more general form.
.H 3 "nargs"
.I Nargs\^
is gone.
There is no replacement routine, as
.I nargs\^
cannot be made to work with separate I and D space.
.H 3 "String routines"
.I Strcatn ,
.I strcpyn ,
.I strcmpn,\^
.I index,\^
and
.I rindex\^
have been renamed
.I strncat ,
.I strncpy ,
.I strncmp ,
.I strchr ,
and
.I strrchr ,
respectively.
This follows the recommendations of the
C Standards Task Force, and also
allows compatibility with systems that require
distinct external names to differ within their
first six characters.
.H 3 "Effective user and group ID"
There are two new routines,
.I geteuid\^
and
.I getegid ,
which return the effective user and group ID,
rather than the real user and group ID.
.H 3 "time"
The
.I time\^
routine now returns a
.B long
value; it will also store a copy of the value in the
(long) location addressed by its argument unless that
argument is
.B "(long \(**)0" .
.H 3 "The password file"
The format of
.I /etc/passwd\^
has changed slightly with the introduction of
\s-1UNIX/TS\s+1;
this change is reflected in the various routines which extract
information from
.I /etc/passwd .
In addition, a new file,
.I /etc/group ,
has been created to hold information about group access privileges.
This file is searched by a new set of routines.
.P
The names of the routines under discussion are:
.Ds
endpwent	endgrent
getpwent	getgrent
getpwnam
getpwuid	getgrgid
setpwent	setgrent
.De
.H 1 "THE LANGUAGE"
.H 2 "The Preprocessor"
John Reiser has rewritten the C preprocessor.
The new one is largely compatible with the old one, and much
faster, but there are a few changes.
.H 3 "General"
Symbols defined on the command line by
\f3\-D\fP\f2foo\^\fP
are defined as
.B 1 ,
i.e., as if they had been defined by
.Ds
#define foo 1
.De
or
.Ds
\-Dfoo=1
.De
This means that names automatically defined by the
preprocessor (specifically
.I unix\^
and
.I pdp11 )
cannot be used as identifiers in the program without naming
them in
.B #undef
statements or using the
.B \-U
preprocessor option.
.P
The directory search order for
.B #include
requests is:
.AL 1 "" compact
.LI
the directory of the file which contains the
.B #include
request
(e.g.
.B #include
is relative to the file being scanned when
the request is made), for statements of the form
.Ds
#include "\f2name\^\fP"
.De
.LI
the directories specified by
.B \-I ,
in left-to-right order (as usual, the null string can be used to
name the current directory)
.LI
the standard directory(s) (which for the \s-1UNIX\s+1 system is
.I /usr/include )
.LE
.P
An unescaped new-line
terminates a character constant or quoted string.
.P
An escaped new-line (a backslash immediately followed by
a new-line)
may be used in the body of a
.B #define
statement to continue
the definition onto the next line.
The escaped new-line is
not included in the macro body.
.P
Comments are uniformly removed (except if the argument
.B \-C
is specified).
They are also ignored, except that a comment terminates a token.
Thus
.Ds
foo/* la di da */bar
.De
may expand `foo' and `bar' but
will never expand `foobar'.
If neither `foo' nor `bar' is a
macro then the output is the string `foobar', even if
the preprocessor name `foobar' is defined as something else.
The file
.Ds
#define foo(a,b)b/**/a
foo(1,2)
.De
produces `21' because the comment causes a break which enables
the recognition of `b' and `a' as formals in the string "b/**/a".
.P
Macro formal parameters are recognized in
.B #define
bodies even inside
character constants and quoted strings.
The output from
.Ds
#define foo(a) `\e\ea'
foo(bar)
.De
is the seven characters " '\e\ebar'".
Macro names are not recognized
inside character constants or quoted strings during the regular scan.
Thus
.Ds
#define foo bar
printf("foo");
.De
does not expand `foo' in the second line, because it is inside
a quoted string which is not part of a
.B #define
macro definition.
.P
Macros are not expanded while processing a
.B #define
or
.B #undef .
Thus
.Ds
#define foo bletch
#define bar foo
#undef foo
bar
.De
produces `foo'.
The token appearing immediately after an
.B #ifdef
or
.B #ifndef
is not expanded (of course!).
.P
Macros are not expanded during the scan which determines the actual
parameters to another macro call.
Thus
.Ds
#define foo(a,b)b a
#define bar hi
foo(bar,
#define bar bye
)
.De
produces " bye" (and warns about the redefinition of `bar').
.H 3 "Bugs fixed"
.AL 1 "" compact
.LI
"1.e4" is recognized as a floating-point number, rather than as an
opportunity to expand the possible macro name "e4".
.LI
Any kind and amount of white space (space, tab, line-feed, vertical tab,
form-feed, carriage return) is allowed between a macro name and
the left parenthesis which introduces its actual parameters.
.LI
The comma operator is legal in preprocessor
.B #if
statements.
.LI
Macros with parameters are legal in preprocessor
.B #if
statements.
.LI
Single-character character constants are legal in preprocessor
.B #if
statements.
.LI
Line-feeds are put out in the proper place when a multi-line comment
is not passed through to the output.
.LI
The following example expands to "# # #" :
.Ds
#define foo #
foo foo foo
.De
.LI
If the \-R flag is not specified then the invocation of some recursive
macros is trapped and the recursion forcibly terminated with an
error message.
The recursions that are trapped are the ones
in which the nesting level is non-decreasing from some point on.
In particular,
.Ds
#define a a
a
.De
will be detected.
(Use "#undef a" if that is what you want.)
.LI
The recursion
.Ds
#define a c b
#define b c a
#define c foo
a
.De
will not be detected because the nesting level decreases after
each expansion of "c".
.LI
The \-R flag specifically allows recursive macros and recursion will
be strictly obeyed (to the extent that space is available).
Assuming that \-R is specified:
.Ds
#define a a
a
.De
causes an infinite loop with very little output.
The tail recursion
.Ds
#define a <b
#define b >a
a
.De
causes the string "<>" to be output infinitely many times.
The non-tail recursion
.Ds
#define a b>
#define b a<
a
.De
complains "too much pushback", dumps the ``pushback'', and continues
(again, infinitely).
.LE
.H 3 "Stylistic choice"
.AL 1 "" compact
.LI
Nothing (not even line-feeds) is output while a false
.B #if ,
.B #ifdef ,
or
.B #ifndef
is in effect.
Thus when all conditions become true
a line of the form `# 12345 "foo.c"' is output.
.LI
Error and warning messages always appear on standard error (file
descriptor 2).
.LI
Mismatch between the number of formals and actuals in a macro call
produces only a warning, and not an error.
Excess actuals
are ignored; missing actuals are turned into null strings.
.LE
.P
.H 3 "Incompatibility"
The virgule '/' in "a=/*b" is interpreted as the first character of
the pair "/*" which introduces a comment, rather than as the
second character of the divide-and-replace operator "=/".
This incompatibility reflects the recent change in the C language
which made "a/=*b" the legal way to write such a statement
if the meaning "a=a/ *b" is intended.
.H 2 "The Compiler"
.H 3 "Enumerated Data Types"
Enumerated data types are here, though not yet documented,
so that
.B enum
is now a keyword.
.H 3 "Unsigned numbers"
The value returned by
.I sizeof\^
is now
.I unsigned\^
rather than
.I int ,
so care must be exercised in the use of
.B sizeof
in a few strange cases.
For example, the following no longer works:
.Ds
if (n < \- sizeof (x)) { ... }
.De
because unary \- is meaningless when applied to an
.B unsigned
value.
.H 3 "Structure and Union Assignments"
It is now possible to assign structures and pass them as arguments and
results of procedures.
This feature is not new in the latest release, but it is sufficiently
important that it is worth noting anyway.
.H 1 "SOURCE STRUCTURE"
The new preprocessor and changes in the library make the source structure
of this new release of C different from previous versions.
.H 2 "The Compiler"
The new preprocessor is comprised of three source modules:
\f2cpp\^\fP\f3.\fP\f2c\^\fP,
\f2cpy\^\fP\f3.\fP\f2y\^\fP,
and
\f2yylex\^\fP\f3.\fP\f2c\^\fP.
\f2Cpy\^\fP\f3.\fP\f2y\^\fP
should be processed by
.I yacc\^
to produce
\f2cpy\^\fP\f3.\fP\f2c\^\fP;
this and
\f2cpp\^\fP\f3.\fP\f2c\^\fP
should then be compiled together to produce the preprocessor.
Despite its name,
\f2yylex\^\fP\f3.\fP\f2c\^\fP
does not involve using
.I lex ,
and it is not directly compiled;
rather, it is named by
.B #include s
in the other modules.
.H 2 "The Library"
The source for the mathematical routines in the C library
is now in
.I /usr/src/lib/libm .
The source in
.I /usr/src/lib/libc\^
is now organized in five subdirectories:
.AL 1 "" compact
.LI
.I crt ,
which contains run-time routines that are invoked by
generated object code without ever being explicitly
referenced by the programmer.
These routines are largely in assembler language, and do things
like
.B long
multiplication and division.
.LI
.I csu ,
which contains routines that are explicitly referenced by the
.I cc\^
command;
these routines are used for run-time initialization.
.LI
.I gen ,
which contains those routines described in section 3 of the manual
that are not part of the ``standard I/O package'',
.LI
.I stdio ,
which contains those routines described in section 3 of the manual
that
.I are\^
part of the ``standard I/O package'', and
.LI
.I sys ,
which contains the routines described in section 2 of the manual.
These routines are all in assembler language, and are interfaces
between the C language and the \s-1UNIX\s+1 system calls.
.LE
.P
The other files in the
.I /usr/src/lib/libc\^
directory
are used as part of the installation procedures.
\f2Order\^\fP\f3.\fP\f2in\^\fP
and
\f2order\^\fP\f3.\fP\f2out\^\fP
are used to define the ordering of the modules in
\f2/lib/libc\^\fP\f3.\fP\f2a\^\fP,
and
\f2libc\^\fP\f3.\fP\f2rc\^\fP
is a command file to recompile the library.
.sp
.I "May 1979"