.fp 3 G .TL C Reference Manual .AU Dennis M. Ritchie .AI .MH .sp May 1, 1977 .PP .so manmacs .EQ delim $$ .EN .FS .nh Revised June, 1978 by R. Miller, University of Wollongong. .hy .FE .SH .ti 0 1. Introduction .LP C is a computer language which offers a rich selection of operators and data types and the ability to impose useful structure on both control flow and data. All the basic operations and data objects are close to those actually implemented by most real computers, so that a very efficient implementation is possible, but the design is not tied to any particular machine and with a little care it is possible to write easily portable programs. .PP This manual describes the current version of the C language as it exists on the \*(pd, the Honeywell 6000, the \s8IBM\s10 System/370, .MC and the \*I 16-bit and 32-bit series. .mc Where differences exist, it concentrates on the \*(pd .MC and \*I, .mc but tries to point out implementation-dependent details. With few exceptions, these dependencies follow directly from the underlying properties of the hardware; the various compilers are generally quite compatible. .bp .SH 2. Lexical conventions .LP Blanks, tabs, newlines, and comments as described below are ignored except as they serve to separate tokens. Some space is required to separate otherwise adjacent identifiers, keywords, and constants. .PP If the input stream has been parsed into tokens up to a given character, the next token is taken to include the longest string of characters which could possibly constitute a token. .SH 2.1 Comments .LP The characters .Bd /\** introduce a comment, which terminates with the characters .Bd \**/ "" . Comments do not nest. .SH 2.2 Identifiers (Names) .LP An identifier is a sequence of letters and digits; the first character must be alphabetic. The underscore `\(ru' counts as alphabetic. Upper and lower case letters are considered different. .MC No more than the first eight characters are significant (although more may be used). External identifiers, which are used by various assemblers and loaders, are more restricted: .TS center; l l . DEC \*(pd 7 characters, 2 cases Honeywell 6000 6 characters, 1 case IBM 360/370 7 characters, 1 case \*I \*(I2 8 characters, 2 cases \*I \*(I1 6 characters, 1 case .TE .mc .SH 2.3 Keywords .LP The following identifiers are reserved for use as keywords, and may not be used otherwise: .DS L .TS center; LfG LfG LfG . int extern else char register for float typedef do double static while struct goto switch union return case long sizeof default short break entry unsigned continue auto if .TE .DE The .Bd entry keyword is not currently implemented by any compiler but is reserved for future use. Some implementations also reserve the words .Bd fortran .MC and .Bd asm. .mc .SH 2.4 Constants .LP There are several kinds of constants, .MC as described below. Hardware differences between implementations are summarized in \(sc2.6. .mc .SH 2.4.1 Integer constants .LP An integer constant consisting of a sequence of digits is taken to be octal if it begins with \fG0\fR (digit zero), decimal otherwise. The digits \fG8\fR and \fG9\fR have octal value 10 and 11 respectively. A sequence of digits preceded by .Bd 0x or .Bd 0X (digit zero) is taken to be a hexadecimal integer. The hexadecimal digits include .Bd a or .Bd A through .Bd f or .Bd F with values 10 through 15. A decimal constant whose value exceeds the largest .MC signed machine integer is taken to be .mc .Bd long "" ; an octal or hex constant which exceeds the largest unsigned machine integer .MC is likewise taken to be .mc .Bd long. .SH 2.4.2 Explicit long constants .LP A decimal, octal, or hexadecimal integer constant immediately followed by .Bd l (letter ell) or .Bd L .MC is a long constant. As discussed below, on some machines .mc integer and long values may be considered identical. .SH 2.4.3 Character constants .LP A character constant is a sequence of characters enclosed in single quotes .Bd \|\(aa ` '. Within a character constant a single quote must be preceded by a backslash `\e'. Certain non-graphic characters, and `\e' itself, may be escaped according to the following table: .DS L .TS center; L L . \s8BS\s10 \eb \s8NL (LF)\s10 \en \s8CR\s10 \er \s8HT\s10 \et \s8FF\s10 \ef \fIddd\fR \e\fIddd\fR \e \e\e .TE .DE The escape `\e\fIddd\|\fR' consists of the backslash followed by 1, 2, or 3 octal digits which are taken to specify the value of the desired character. A special case of this construction is `\e0' (not followed by a digit) which indicates the character .SM NUL. .NL If the character following a backslash is not one of those specified, the backslash vanishes. .PP The value of a single-character constant is the numerical value of the .MC character in the machine's character set (\s8ASCII\s10 for the \*I and \*(pd). .mc On the \*(pd at most two characters are permitted in a character constant and the second character of a pair is stored in the high-order byte of the integer value. .MC On the \*I up to two (\*(I1) or four (\*(I2) characters are permitted, and are stored right-justified in a word. .mc Character constants with more than one character are inherently machine-dependent and should be avoided. .SH 2.4.4 Floating constants .LP A floating constant consists of an integer part, a decimal point, a fraction part, an .Bd e or .Bd E, and an optionally signed integer exponent. The integer and fraction parts both consist of a sequence of digits. Either the integer part or the fraction part (not both) may be missing; either the decimal point or the \fGe\fR and the exponent (not both) may be missing. Every floating constant is taken to be double-precision. .SH 2.5 Strings .LP A string is a sequence of characters surrounded by double quotes `\|"\|'. A string has type `array of characters' and storage class `static' (see below) and is initialized with the given characters. The compiler places a null byte `\|\e0\|' at the end of each string so that programs which scan the string can find its end. In a string, the character `\|"\|' must be preceded by a `\e'\|; in addition, the same escapes as described for character constants may be used. Finally, a `\e' and an immediately following new-line are ignored. .PP All strings, even when written identically, are distinct. .ne 13 .SH 2.6 Hardware Characteristics .LP .TS c c c c c c c c c c c c l l l l l l . DEC Honeywell IBM \*I \*I \*(pd 6000 370 \*(I1 \*(I2 ASCII ASCII EBCDIC ASCII ASCII char 8 bits 9 bits 8 bits 8 bits 8 bits int 16 36 32 16 32 short 16 36 16 16 16 long 32 36 32 32 32 float 32 36 32 32 32 double 64 72 64 32 32 range $\(+-10 sup \(+-38$ $\(+-10 sup \(+-38$ $\(+-10 sup \(+-76$\ $\(+-10 sup \(+-76$ $\(+-10 sup \(+-76$ .TE .SH 3. Syntax notation .LP In the syntax notation used in this manual, syntactic categories are indicated by .MC \fIunderlining\fR, and literal words and characters in \fGbold\fR type. .mc Alternatives are listed on separate lines. An optional terminal or non-terminal symbol is indicated by the subscript `opt,' so that .SY { expression\*(op } .ES would indicate an optional expression in braces. The complete syntax is given in \(sc16, in the notation of YACC. .SH 4. What's in a Name? .LP C bases the interpretation of an identifier upon two attributes of the identifier: its .It "storage class" and its .It type. The storage class determines the location and lifetime of the storage associated with an identifier; the type determines the meaning of the values found in the identifier's storage. .PP There are four declarable storage classes: automatic, static, external, and register. Automatic variables are local to each invocation of a block, and are discarded upon exit from the block; static variables are local to a block, but retain their values upon reentry to a block even after control has left the block; external variables exist and retain their values throughout the execution of the entire program, and may be used for communication between functions, even separately compiled functions. Register variables are (if possible) stored in the fast registers of the machine; like automatic variables they are local to each block and disappear on exit from the block. .PP C supports several fundamental types of objects: .PP Objects declared as characters .Bd (char) are large enough to store any member of the implementation's character set, and if a genuine character is stored in a character variable, its value is equivalent to the integer code for that character. Other quantities may be stored into character variables, but .MC the implementation is machine-dependent. .mc .PP Up to three sizes of integer, declared .Bd "short int," .Bd int, and .Bd "long int" are available. Longer integers provide no less storage than shorter ones, but the implementation may make either short integers, or long integers, or both equivalent to plain integers. `Plain' integers have the natural size suggested by the host machine architecture; the other sizes are provided to meet special needs. On the \*(pd and \*I, .MC integers are represented in 2's complement notation. .mc .PP Unsigned integers, declared .Bd unsigned, obey the laws of arithmetic modulo $2 sup n$ where $n$ is the number of bits in the representation. .MC On the \*I and \*(pd, long and short unsigned quantities are not supported. .mc .PP Single precision floating point (\fGfloat\fR) .MC and double-precision floating-point (\fGdouble\fR) quantities are available. The \*I implementations currently make .mc .Bd float and .Bd double synonymous. .PP Because objects of these types can usefully be interpreted as numbers, they will be referred to as .It arithmetic types. Types .Bd char and .Bd int of all sizes will collectively be called .It integral types. .Bd Float and .Bd double will collectively be called .It floating types. .PP Besides the fundamental arithmetic types there is a conceptually infinite class of derived types constructed from the fundamental types in the following ways: .IP .It arrays of objects of most types; .IP .It functions which return objects of a given type; .IP .It pointers to objects of a given type; .IP .It structures containing a sequence of objects of various types; .IP .It unions capable of containing any one of several objects of various types. .LP In general these methods of constructing objects can be applied recursively. .SH 5. Objects and lvalues .LP An .It object is a manipulatable region of storage; an .It lvalue is an expression referring to an object. An obvious example of an lvalue expression is an identifier. There are operators which yield lvalues: for example, if E is an expression of pointer type, then \**E is an lvalue expression referring to the object to which E points. The name `lvalue' comes from the assignment expression `$E1@=@E2$' in which the left operand E1 must be an lvalue expression. The discussion of each operator below indicates whether it expects lvalue operands and whether it yields an lvalue. .SH 6. Conversions .LP A number of operators may, depending on their operands, cause conversion of the value of an operand from one type to another. This section explains the result to be expected from such conversions. \(sc6.6 summarizes the conversions demanded by most ordinary operators; it will be supplemented as required by the discussion of each operator. .SH 6.1 Characters and integers .LP A character or a short integer may be used wherever an integer may be used. In all cases the value is converted to an integer. Conversion of a short integer always involves sign extension; short integers are signed quantities. Whether or not sign-extension occurs for characters is machine dependent, but it is guaranteed that a member of the standard character set is non-negative. On the \*(pd, character variables range in value from \-128 to 127; a character constant specified using an octal escape also suffers sign extension and may appear negative, for example .MC \|\(aa\|\e377\(aa\| when converted to an integer becomes -1. .mc .PP When a longer integer is converted to a shorter or to a .Bd char, it is truncated on the left. .SH 6.2 Float and double .LP All floating arithmetic in C is carried out in double-precision; whenever a \fGfloat\fR appears in an expression it is lengthened to \fGdouble\fR by zero-padding its fraction. When a \fGdouble\fR must be converted to \fGfloat\fR, for example by an assignment, the \fGdouble\fR is rounded before truncation to \fGfloat\fR length. .SH 6.3 Floating and integral .LP Conversions of floating values to integral type .MC tend to be rather machine-dependent; in particular, the direction of truncation of negative numbers varies from machine to machine. .mc The result is undefined if the value will not fit in the space provided. .PP Conversions of integral values to floating type are well behaved. Some loss of precision occurs if the destination lacks sufficient bits. .SH 6.4 Pointers and integers .LP An integer or long integer may be added to or subtracted from a pointer; in such a case the first is converted as specified in the discussion of the addition operator. .PP Two pointers to objects of the same type may be subtracted; in this case the result is converted to an integer as specified in the discussion of the subtraction operator. .SH 6.5 Unsigned .LP Whenever an unsigned integer and a plain integer are combined, the plain integer is converted to unsigned and the result is unsigned. .MC The value is the least unsigned integer congruent to the signed integer (modulo $2 sup wordsize$). In a 2's complement representation .mc this conversion is conceptual and there is no actual change in the bit pattern. .PP When an unsigned integer is converted to long, the value of the result is the same numerically as that of the unsigned integer. Thus the conversion amounts to padding with zeros on the left. .SH 6.6 Arithmetic conversions .LP A great many operators cause conversions and yield result types in a similar way. This pattern will be called the `usual arithmetic conversions.' .IP First, any operands of type .Bd char or .Bd short are converted to .Bd int, and any of type .Bd float are converted to .Bd double. .IP Then, if either operand is .Bd double, the other is converted to .Bd double and that is the type of the result. .IP Otherwise, if either operand is .Bd long, the other is converted to .Bd long and that is the type of the result. .IP Otherwise, if either operand is .Bd unsigned, the other is converted to .Bd unsigned and that is the type of the result. .IP Otherwise, both operands must be .Bd int, and that is the type of the result.