.ds :? C Environment of UNIX/TS .PH "''''" .OH "'\s9\f2\*(:?\fP''\\\\nP\s0'" .EH "'\s9\\\\nP''\f2\*(:?\^\fP\s0'" '\" nothing much fixed! .hy 14 .ds q \s-1UNIX/TS\s+1 .ds m \f2U\s-1NIX/TS\s+1 User's Manual\^\fP .tr ~ .nr Hb 3 .nr Hs 3 .nr Hu 4 .ds HF 3 3 2 3 2 .bd S B 3 .de Ds .DS 1 .br .lg 0 \!.lg 0 .ss 20 \!.ss 20 .br .. .de De .br \!.ss 12 .ss 12 .lg \!.lg .br .DE .. .de I .nr ;F \\n(.f\"save current font .ft 2 .if \\n(.$ .if !\\n(.$-1 \&\\$1 .if \\n(.$-1 \&\\$1\^\c .if \\n(.$ .ft\\n(;F\"back to saved font .if \\n(.$-1 \&\\$2 .. .TL The C Environment of U\s-2NIX/TS\s+2 .AU "Andrew R. Koenig" ARK MH .MT 4 .H 1 "INTRODUCTION" This document describes differences users may encounter when changing to \s-1UNIX\s+1\(dg\s-1/TS\s+1 .FS \(dg UNIX is a Trademark of Bell Laboratories. .FE from the various so-called ``UNIX Sixth Edition'' C compilers. The document is intended as a conversion aid, so the emphasis is on incompatibilities, rather than new facilities. .P Note that this document is only a guide; refer to the \*m for complete information. .PP This version of this document supersedes all previous versions thereof. .H 1 "LIBRARY CHANGES" The changes that are most likely to be noticed are in the run-time library. The ``Standard I/O Library'' has been incorporated into \f2/lib/libc\^\fP\f3.\fP\f2a\^\fP along with the contents of \f2/lib/liba\^\fP\f3.\fP\f2a\^\fP; this latter library is gone. .I Printf\^ has been rewritten into portable C; there are a few incompatibilities with the old version. Finally, there are a number of smaller changes and incompatibilities. .H 2 "Environments" The system now makes available to the user program a table of .I "environment variables" . Each variable has a name and a value; both name and value are character strings. The values of environment variables are preserved across .I fork\^ and .I exec ; they can also be altered easily using the shell and somewhat less easily using the new .I execle\^ and .I execve\^ system calls. .P The new .I getenv\^ function can be used to retrieve the value of an environment variable. .H 2 "The Standard I/O Library" In the past, there were two I/O libraries available. One was documented by .I "A New Input-Output Package\^" (Ritchie), and was made available through the .B \-lS loader option. The other, older one was made available whenever a C program was being compiled; it was characterized, among other things, by use of the names .I fin\^ and .I fout\^ to control disposition of standard input and output files. .P The older library has now vanished, along with the .B \-lS option. All programs will receive the new I/O library without any explicit action. In addition, the libraries obtained by .B \-lc and .B \-la have been merged; this combined library is accessed (where needed) by .B \-lc or .B \-l . .H 2 "Printf" In the interests of portability, .I printf\^ has been rewritten into portable C. This results in load modules some 1800 bytes larger than previous versions. .P The correct way to write a long integer is now .B %ld or .B %lo ; the previous forms .B D and .B O are going away. The purpose of this is to permit .B X , .B E , and .B G format codes for indicating that the letters produced by the format code are to appear in upper case. .P The .B %r format code has been removed; if you don't know what it was, you don't want to know. .H 2 "Scanf" .H 3 "White space" The way in which .I scanf\^ treats white space has changed slightly. No longer is it the case that .I scanf\^ will skip white space in the input for each character in the format. Rather, a space, tab, or new-line in the format will match optional white space in the input. Thus: .Ds "alpha = %d" .De will match any of .Ds alpha=12 alpha =12 alpha= 12 alpha = 12 .De but not .Ds a lpha=12 .De as was formerly the case. Note that this change may require white space to be inserted in format strings of formerly working programs to maintain compatibility. .H 3 "Character class formats" A character class format item (such as "%[0123456789]") is now permitted to match a null string. Thus, .Ds scanf (":%[^:]:", x); .De will no longer fail when presented with .Ds :: .De .H 2 "Mathematical Routines" The mathematical subroutines have been moved to a separate library obtainable by the .B \-lm option. Declarations for these routines can be obtained in .B <math.h> . .H 2 "Character class routines" The routines that test character class (\f2isdigit\^\fP, etc.) are no longer defined in .B <stdio.h> ; rather, they are defined in .B <ctype.h> . Thus, a line of the form .Ds #include <ctype.h> .De will have to be added to those programs which use .I isdigit , .I isupper , .I islower , and their relatives. .PP The domain of the character class routines has been extended to match the range of .I getc : \-1 through 255. .PP The character class routine .I isprint\^ has been revised to conform to its documentation; a space is now considered a printable character. To determine if a character has a graphic representation, use the (new) function .I isgraph . .H 2 "Character Conversion Routines" The routines .I toupper\^ and .I tolower\^ have had their domain extended to the range of .I getc : .I toupper\^ will return its argument unchanged if that argument is not a lower-case letter, and .I toupper\^ will return its argument if it is not an upper-case letter. This change required rewriting .I toupper\^ and .I tolower\^ as true subroutines, rather than macros; for applications where efficiency is paramount and the argument is already known to be a letter of the appropriate case, the original macros have been renamed .I _toupper\^ and .I _tolower . .H 2 "Error Recovery" The library now incorporates two new routines, .I ssignal\^ and .I gsignal . In the future, these routines will be used by other routines in the library to cause automatic program termination on detection of various common errors, with the possibility of finer control as a user option. .P This description is deliberately vague, as the facility is still in the planning stage. .H 2 "Time of Day" There is a new function .I tzset . It is called with no arguments, and looks for an environment variable .B TZ . This variable is expected to be in the form \f3EST\fP\f2n\^\fP or \f3EST\fP\f2n\^\fP\f3EDT\fP, where .I n\^ is a string of digits with an optional negative sign and represents the difference between the local time zone and GMT, surrounded by the names of the local and (optional) daylight time zones. If .I tzset\^ finds an environment variable .B TZ in this form, it sets the time zone parameters .I timezone , .I tzname , and .I daylight\^ appropriately. .I Tzset\^ is now called automatically by .I asctime , so it usually need not be called by the user. .P Note also that the variable .I timezone\^ is now a .B long , so programs referencing it will have to be changed slightly. .H 2 "Miscellaneous" .H 3 "chown" .I Chown\^ now takes three arguments: the file name, the new owner, and the new group. This is necessary because owner and group can now each be up to 16 bits. .H 3 "tell" .I Tell\^ is gone; .I lseek\^ instead returns a value indicating the location sought. .H 3 "setexit and reset" .I Setexit\^ and .I reset\^ are gone; their function is taken over by .I setjmp\^ and .I longjmp . These new routines provide all the facilities of .I setexit\^ and .I reset\^ in a more general form. .H 3 "nargs" .I Nargs\^ is gone. There is no replacement routine, as .I nargs\^ cannot be made to work with separate I and D space. .H 3 "String routines" .I Strcatn , .I strcpyn , .I strcmpn,\^ .I index,\^ and .I rindex\^ have been renamed .I strncat , .I strncpy , .I strncmp , .I strchr , and .I strrchr , respectively. This follows the recommendations of the C Standards Task Force, and also allows compatibility with systems that require distinct external names to differ within their first six characters. .H 3 "Effective user and group ID" There are two new routines, .I geteuid\^ and .I getegid , which return the effective user and group ID, rather than the real user and group ID. .H 3 "time" The .I time\^ routine now returns a .B long value; it will also store a copy of the value in the (long) location addressed by its argument unless that argument is .B "(long \(**)0" . .H 3 "The password file" The format of .I /etc/passwd\^ has changed slightly with the introduction of \s-1UNIX/TS\s+1; this change is reflected in the various routines which extract information from .I /etc/passwd . In addition, a new file, .I /etc/group , has been created to hold information about group access privileges. This file is searched by a new set of routines. .P The names of the routines under discussion are: .Ds endpwent endgrent getpwent getgrent getpwnam getpwuid getgrgid setpwent setgrent .De .H 1 "THE LANGUAGE" .H 2 "The Preprocessor" John Reiser has rewritten the C preprocessor. The new one is largely compatible with the old one, and much faster, but there are a few changes. .H 3 "General" Symbols defined on the command line by \f3\-D\fP\f2foo\^\fP are defined as .B 1 , i.e., as if they had been defined by .Ds #define foo 1 .De or .Ds \-Dfoo=1 .De This means that names automatically defined by the preprocessor (specifically .I unix\^ and .I pdp11 ) cannot be used as identifiers in the program without naming them in .B #undef statements or using the .B \-U preprocessor option. .P The directory search order for .B #include requests is: .AL 1 "" compact .LI the directory of the file which contains the .B #include request (e.g. .B #include is relative to the file being scanned when the request is made), for statements of the form .Ds #include "\f2name\^\fP" .De .LI the directories specified by .B \-I , in left-to-right order (as usual, the null string can be used to name the current directory) .LI the standard directory(s) (which for the \s-1UNIX\s+1 system is .I /usr/include ) .LE .P An unescaped new-line terminates a character constant or quoted string. .P An escaped new-line (a backslash immediately followed by a new-line) may be used in the body of a .B #define statement to continue the definition onto the next line. The escaped new-line is not included in the macro body. .P Comments are uniformly removed (except if the argument .B \-C is specified). They are also ignored, except that a comment terminates a token. Thus .Ds foo/* la di da */bar .De may expand `foo' and `bar' but will never expand `foobar'. If neither `foo' nor `bar' is a macro then the output is the string `foobar', even if the preprocessor name `foobar' is defined as something else. The file .Ds #define foo(a,b)b/**/a foo(1,2) .De produces `21' because the comment causes a break which enables the recognition of `b' and `a' as formals in the string "b/**/a". .P Macro formal parameters are recognized in .B #define bodies even inside character constants and quoted strings. The output from .Ds #define foo(a) `\e\ea' foo(bar) .De is the seven characters " '\e\ebar'". Macro names are not recognized inside character constants or quoted strings during the regular scan. Thus .Ds #define foo bar printf("foo"); .De does not expand `foo' in the second line, because it is inside a quoted string which is not part of a .B #define macro definition. .P Macros are not expanded while processing a .B #define or .B #undef . Thus .Ds #define foo bletch #define bar foo #undef foo bar .De produces `foo'. The token appearing immediately after an .B #ifdef or .B #ifndef is not expanded (of course!). .P Macros are not expanded during the scan which determines the actual parameters to another macro call. Thus .Ds #define foo(a,b)b a #define bar hi foo(bar, #define bar bye ) .De produces " bye" (and warns about the redefinition of `bar'). .H 3 "Bugs fixed" .AL 1 "" compact .LI "1.e4" is recognized as a floating-point number, rather than as an opportunity to expand the possible macro name "e4". .LI Any kind and amount of white space (space, tab, line-feed, vertical tab, form-feed, carriage return) is allowed between a macro name and the left parenthesis which introduces its actual parameters. .LI The comma operator is legal in preprocessor .B #if statements. .LI Macros with parameters are legal in preprocessor .B #if statements. .LI Single-character character constants are legal in preprocessor .B #if statements. .LI Line-feeds are put out in the proper place when a multi-line comment is not passed through to the output. .LI The following example expands to "# # #" : .Ds #define foo # foo foo foo .De .LI If the \-R flag is not specified then the invocation of some recursive macros is trapped and the recursion forcibly terminated with an error message. The recursions that are trapped are the ones in which the nesting level is non-decreasing from some point on. In particular, .Ds #define a a a .De will be detected. (Use "#undef a" if that is what you want.) .LI The recursion .Ds #define a c b #define b c a #define c foo a .De will not be detected because the nesting level decreases after each expansion of "c". .LI The \-R flag specifically allows recursive macros and recursion will be strictly obeyed (to the extent that space is available). Assuming that \-R is specified: .Ds #define a a a .De causes an infinite loop with very little output. The tail recursion .Ds #define a <b #define b >a a .De causes the string "<>" to be output infinitely many times. The non-tail recursion .Ds #define a b> #define b a< a .De complains "too much pushback", dumps the ``pushback'', and continues (again, infinitely). .LE .H 3 "Stylistic choice" .AL 1 "" compact .LI Nothing (not even line-feeds) is output while a false .B #if , .B #ifdef , or .B #ifndef is in effect. Thus when all conditions become true a line of the form `# 12345 "foo.c"' is output. .LI Error and warning messages always appear on standard error (file descriptor 2). .LI Mismatch between the number of formals and actuals in a macro call produces only a warning, and not an error. Excess actuals are ignored; missing actuals are turned into null strings. .LE .P .H 3 "Incompatibility" The virgule '/' in "a=/*b" is interpreted as the first character of the pair "/*" which introduces a comment, rather than as the second character of the divide-and-replace operator "=/". This incompatibility reflects the recent change in the C language which made "a/=*b" the legal way to write such a statement if the meaning "a=a/ *b" is intended. .H 2 "The Compiler" .H 3 "Enumerated Data Types" Enumerated data types are here, though not yet documented, so that .B enum is now a keyword. .H 3 "Unsigned numbers" The value returned by .I sizeof\^ is now .I unsigned\^ rather than .I int , so care must be exercised in the use of .B sizeof in a few strange cases. For example, the following no longer works: .Ds if (n < \- sizeof (x)) { ... } .De because unary \- is meaningless when applied to an .B unsigned value. .H 3 "Structure and Union Assignments" It is now possible to assign structures and pass them as arguments and results of procedures. This feature is not new in the latest release, but it is sufficiently important that it is worth noting anyway. .H 1 "SOURCE STRUCTURE" The new preprocessor and changes in the library make the source structure of this new release of C different from previous versions. .H 2 "The Compiler" The new preprocessor is comprised of three source modules: \f2cpp\^\fP\f3.\fP\f2c\^\fP, \f2cpy\^\fP\f3.\fP\f2y\^\fP, and \f2yylex\^\fP\f3.\fP\f2c\^\fP. \f2Cpy\^\fP\f3.\fP\f2y\^\fP should be processed by .I yacc\^ to produce \f2cpy\^\fP\f3.\fP\f2c\^\fP; this and \f2cpp\^\fP\f3.\fP\f2c\^\fP should then be compiled together to produce the preprocessor. Despite its name, \f2yylex\^\fP\f3.\fP\f2c\^\fP does not involve using .I lex , and it is not directly compiled; rather, it is named by .B #include s in the other modules. .H 2 "The Library" The source for the mathematical routines in the C library is now in .I /usr/src/lib/libm . The source in .I /usr/src/lib/libc\^ is now organized in five subdirectories: .AL 1 "" compact .LI .I crt , which contains run-time routines that are invoked by generated object code without ever being explicitly referenced by the programmer. These routines are largely in assembler language, and do things like .B long multiplication and division. .LI .I csu , which contains routines that are explicitly referenced by the .I cc\^ command; these routines are used for run-time initialization. .LI .I gen , which contains those routines described in section 3 of the manual that are not part of the ``standard I/O package'', .LI .I stdio , which contains those routines described in section 3 of the manual that .I are\^ part of the ``standard I/O package'', and .LI .I sys , which contains the routines described in section 2 of the manual. These routines are all in assembler language, and are interfaces between the C language and the \s-1UNIX\s+1 system calls. .LE .P The other files in the .I /usr/src/lib/libc\^ directory are used as part of the installation procedures. \f2Order\^\fP\f3.\fP\f2in\^\fP and \f2order\^\fP\f3.\fP\f2out\^\fP are used to define the ordering of the modules in \f2/lib/libc\^\fP\f3.\fP\f2a\^\fP, and \f2libc\^\fP\f3.\fP\f2rc\^\fP is a command file to recompile the library. .sp .I "May 1979"