.SH 10. External definitions .LP A C program consists of a sequence of external definitions. An external definition declares an identifier to have storage class .Bd extern (by default) or perhaps .Bd static, and a specified type. The type-specifier (\(sc8.2) may also be empty, in which case the type is taken to be \fGint\fR. The scope of external definitions persists to the end of the file in which they are declared just as the effect of declarations persists to the end of a block. The syntax of external definitions is the same as that of all declarations, except that only at this level may the code for functions be given. .SH 10.1 External function definitions .LP Function definitions have the form .SY function-definition: decl-specifiers\*(op function-declarator function-body .ES The only sc-specifiers allowed among the decl-specifiers are .Bd extern or .Bd static; See \(sc11.2 for the distinction between them. A function declarator is similar to a declarator for a `function returning ...' except that it lists the formal parameters of the function being defined. .SY function-declarator: declarator \fG( \fIparameter-list\*(op \fG) .ES .SY parameter-list: identifier identifier \fG,\fI parameter-list .ES The function-body has the form .SY function-body: declaration-list compound-statement .ES The identifiers in the parameter list, and only those identifiers, may be declared in the declaration list. Any identifiers whose type is not given are taken to be .Bd int. The only storage class which may be specified is .Bd register; if it is specified, the corresponding actual parameter will be copied, if possible, into a register at the outset of the function. .PP A simple example of a complete function definition is .PR int max\|(\|a, b, c) int a, b, c; { int m; m = \|(\|a\|>\|b\|)? a\|:\|b\|; return\|(\|m\|>\|c? m\|:\|c\|)\|; } .EP Here `int' is the type-specifier; `max(a,@b,@c)' is the function-declarator; `int@a,@b,@c;' is the declaration-list for the formal parameters; `{@.\|.\|.@}' is the block giving the code for the statement. The parentheses in the .Bd return are not required. .PP C converts all \fGfloat\fR actual parameters to \fGdouble\fR, so formal parameters declared \fGfloat\fR have their declaration adjusted to read \fGdouble\fR. Also, since a reference to an array in any context (in particular as an actual parameter) is taken to mean a pointer to the first element of the array, declarations of formal parameters declared `array of ...' are adjusted to read `pointer to ...'. Finally, because neither structures nor functions can be passed to a function, it is useless to declare a formal parameter to be a structure or function (pointers to structures or functions are of course permitted). .PP A free \fGreturn\fR statement is supplied at the end of each function definition, so running off the end causes control, but no value, to be returned to the caller. .SH 10.2 External data definitions .LP An external data definition has the form .SY data-definition: declaration .ES The storage class of such data may be .Bd extern (which is the default) or .Bd static, but not .Bd auto or .Bd register. .SH 11. Scope rules .LP A C program need not all be compiled at the same time: the source text of the program may be kept in several files, and precompiled routines may be loaded from libraries. Communication among the functions of a program may be carried out both through explicit calls and through manipulation of external data. .PP Therefore, there are two kinds of scope to consider: first, what may be called the \fIlexical scope\fR of an identifier, which is essentially the region of a program during which it may be used without drawing `undefined identifier' diagnostics; and second, the scope associated with external identifiers, which is characterized by the rule that references to the same external identifier are references to the same object. .SH 11.1 Lexical scope .LP The lexical scope of identifiers declared in external definitions persists from the definition through the end of the file in which they appear. The lexical scope of identifiers which are formal parameters persists through the function with which they are associated. The lexical scope of identifiers declared at the head of blocks persists until the end of the block. The lexical scope of labels is the whole of the function in which they appear. .PP Because all references to the same external identifier refer to the same object (see \(sc11.2) the compiler checks all declarations of the same external identifier for compatibility; in effect their scope is increased to the whole file in which they appear. .PP In all cases, however, if an identifier is explicitly declared at the head of a block, including the block constituting a function, any declaration of that identifier outside the block is suspended until the end of the block. .PP Remember also (\(sc8.5) that identifiers associated with ordinary variables on the one hand and those associated with structure and union members and tags on the other form two disjoint classes which do not conflict. .Bd Typedef names are in the same class as ordinary identifiers. They may be redeclared in inner blocks, but an explicit type must be given in the inner declaration: .PR typedef float distance; \&. . . { auto int distance; . . . .EP The .Bd int must be present in the second declaration, or it would be taken to be a declaration with no declarators and type .Bd distance.* .FS *It is agreed that the ice is thin here. .FE .SH 11.2 Scope of externals .LP .MC If a function refers to an identifier declared to be .mc \fGextern\fR, then somewhere among the files or libraries constituting the complete program there must be an external definition for the identifier. All functions in a given program which refer to the same external identifier refer to the same object, so care must be taken that the type and extent specified in the definition are compatible with those specified by each function which references the data. .PP The appearance of the .Bd extern keyword in an external definition indicates that storage for the identifiers being declared will be allocated in another file. Thus in a multi-file program, an external data definition without the .Bd extern specifier must appear in exactly one of the files. Any other files which wish to give an external definition for the identifier must include the .Bd extern in the definition. The identifier can be initialized only in the declaration where storage is allocated. .PP Identifiers declared .Bd static at the top level in external definitions are not visible in other files. .MC Functions may be declared .Bd static to make their definition local to a file. .mc .SH 12. Compiler control lines .LP The C compiler contains a preprocessor capable of macro substitution, conditional compilation, and inclusion of named files. Lines beginning with `#' communicate with this preprocessor. These lines have syntax independent of the rest of the language; they may appear anywhere and have effect which lasts (independent of scope) until the end of the source program file. .SH 12.1 Token replacement .LP A compiler-control line of the form .SY \fG# define \fIidentifier token-string .ES (note: no trailing semicolon) causes the preprocessor to replace subsequent instances of the identifier with the given string of tokens. A line of the form .SY \fG# define \fIidentifier\fG( \fIidentifier\fG , ... ) \fItoken-string .ES where there is no space between the first identifier and the `(', is a macro definition with arguments. Subsequent instances of the first identifier followed by a `(', a sequence of tokens delimited by commas, and a `)' are replaced by the token string in the definition. Each occurrence of an identifier mentioned in the formal parameter list of the definition is replaced by the corresponding token string from the call. The actual arguments in the call are token strings separated by commas; however commas in quoted strings or protected by parentheses do not separate arguments. The number of formal and actual parameters must be the same. Text inside a string or a character constant is not subject to replacement. .PP In both forms the replacement string is rescanned for more defined identifiers. In both forms a long definition may be continued on another line by writing `\e' at the end of the line to be continued. .PP This facility is most valuable for definition of `manifest constants', as in .PR # define TABSIZE 100 .\|.\|. int table[TABSIZE]; .EP A control line of the form .SY \fG# undef \fIidentifier .ES causes the identifier's preprocessor definition to be forgotten. .SH 12.2 File inclusion .LP A compiler control line of the form .SY \fG# include "\fIfilename\|\fG" .ES causes the replacement of that line by the entire contents of the file \fIfilename\fR. .PP The named file is searched for first in the directory of the original source file, and then in a sequence of standard places. Alternatively, a control line of the form .SY # include <\fIfilename> .ES searches only the standard places, and not the directory of the source file. .PP Includes may be nested. .SH 12.3 Conditional compilation .LP A compiler control line of the form .SY \fG# if \fIconstant-expression .ES checks whether the constant expression (see \(sc15) evaluates to non-zero. A control line of the form .SY \fG# ifdef \fIidentifier .ES checks whether the identifier is currently defined in the preprocessor; that is, whether it has been the subject of a .Bd #define control line. A control line of the form .SY \fG# ifndef \fIidentifier .ES checks whether the identifier is currently undefined in the preprocessor. .PP All three forms are followed by an arbitrary number of lines, possibly containing a control line .SY \fG# else .ES and then by a control line .SY \fG# endif .ES If the checked condition is true then any lines between .Bd #else and .Bd #endif are ignored. If the checked condition is false then any lines between the test and an .Bd #else or, lacking an .Bd #else, the .Bd #endif, are ignored. .PP These constructions may be nested. .SH 12.4 Line control .LP For the benefit of other preprocessors which generate C programs, a line of the form .SY \fG# line \fIconstant identifier .ES causes the compiler to believe, for purposes of error diagnostics, that the next line number is given by the constant and the current input file is named by the identifier. If the identifier is absent the remembered file name does not change. .SH 13. Implicit declarations .LP It is not always necessary to specify both the storage class and the type of identifiers in a declaration. Sometimes the storage class is supplied by the context: in external definitions, and in declarations of formal parameters and structure members. In a declaration inside a function, if a storage class but no type is given, the identifier is assumed to be \fGint\fR; if a type but no storage class is indicated, the identifier is assumed to be \fGauto\fR. An exception to the latter rule is made for functions, since \fGauto\fR functions are meaningless (C being incapable of compiling code into the stack). If the type of an identifier is `function returning ...', it is implicitly declared to be \fGextern\fR. .PP In an expression, an identifier followed by \fG(\fR and not currently declared is contextually declared to be `function returning \fGint\fR'. .SH 14. Types revisited .LP This section summarizes the operations which can be performed on objects of certain types. .SH 14.1 Structures and unions .LP There are only two things that can be done with a structure or union: name one of its members (by means of the \fG\|.\|\fR operator); or take its address (by unary \fG&\fR). Other operations, such as assigning from or to it or passing it as a parameter, draw an error message. In the future, it is expected that these operations, but not necessarily others, will be allowed. .PP \(sc7.1 says that in a direct or indirect structure reference (with \fG.\fR or \(mi>) the name on the right must be a member of the structure named or pointed to by the expression on the left. To allow an escape from the typing rules, this restriction is not firmly enforced by the compiler. In fact, any lvalue is allowed before `\fB.\fR', and that lvalue is then assumed to have the form of the structure of which the name on the right is a member. Also, the expression before a `\(mi>' is required only to be a pointer or an integer. If a pointer, it is assumed to point to a structure of which the name on the right is a member. If an integer, it is taken to be the absolute address, in machine storage units, of the appropriate structure. .PP Such constructions are non-portable. .SH 14.2 Functions .LP There are only two things that can be done with a function: call it, or take its address. If the name of a function appears in an expression not in the function-name position of a call, a pointer to the function is generated. Thus, to pass one function to another, one might say .PR int f(\|\|); ... g(\|f\|); .EP Then the definition of \fIg \fRmight read .PR g\|(\|funcp\|) int (\**funcp)\|(\|\|); { .\|.\|. (\**funcp)\|(\|\|); .\|.\|. } .EP Notice that \fIf\fR was declared explicitly in the calling routine since its first appearance was not followed by \fG(\fR\|. .SH 14.3 Arrays, pointers, and subscripting .LP Every time an identifier of array type appears in an expression, it is converted into a pointer to the first member of the array. Because of this conversion, arrays are not lvalues. By definition, the subscript operator \fG[\|]\fR is interpreted in such a way that `E1[E2]' is identical to `\**(\|(\|E1)\|+\|(E2\|)\|)'. Because of the conversion rules which apply to \fG+\fR, if E1 is an array and E2 an integer, then E1[E2] refers to the E2-th member of E1. Therefore, despite its asymmetric appearance, subscripting is a commutative operation. .PP A consistent rule is followed in the case of multi-dimensional arrays. If E is an \fIn\|\fR-dimensional array of rank $i times j times ... times k$, then E appearing in an expression is converted to a pointer to an (\fIn\fR\(mi1)-dimensional array with rank $j times ... times k$. If the \fG\**\fR operator, either explicitly or implicitly as a result of subscripting, is applied to this pointer, the result is the pointed-to (\fIn\fR\(mi1)-dimensional array, which itself is immediately converted into a pointer. .PP For example, consider .PR int x[3][5]; .EP Here \fIx\fR is a 3\(mu5 array of integers. When \fIx\fR appears in an expression, it is converted to a pointer to (the first of three) 5-membered arrays of integers. In the expression `x[\|i\|]', which is equivalent to `\**(x+i)', \fIx\fR is first converted to a pointer as described; then \fIi\fR is converted to the type of \fIx\fR, which involves multiplying \fIi\fR by the length the object to which the pointer points, namely 5 integer objects. The results are added and indirection applied to yield an array (of 5 integers) which in turn is converted to a pointer to the first of the integers. If there is another subscript the same argument applies again; this time the result is an integer. .PP It follows from all this that arrays in C are stored row-wise (last subscript varies fastest) and that the first subscript in the declaration helps determine the amount of storage consumed by an array but plays no other part in subscript calculations. .SH 14.4 Explicit pointer conversions .LP .MC Certain conversions involving pointers are permitted but have implementation-dependent aspects. They are all specified by means of an explicit type-conversion operator, \(sc\(sc7.2 and 8.7. .PP A pointer may be converted to any of the integral types large enough to hold it. Whether an \fGint\fR or \fGlong\fR is required is machine dependent. The mapping function is also machine dependent, but is intended to be unsurprising to those who know the addressing structure of the machine. Details for some particular machines are given below. .PP An object of integral type may be explicitly converted to a pointer. The mapping always carries an integer converted from a pointer back to the same pointer, but is otherwise machine dependent. .PP A pointer to one type may be converted to a pointer of another type. The resulting pointer may cause addressing exceptions upon use if the subject pointer does not refer to an object suitably aligned in storage. It is guaranteed that a pointer to an object of a given size may be converted to a pointer to an object of a smaller size and back again without change. .PP For example, a storage-allocation routine might accept a size (in bytes) of an object to allocate, and return a \fGchar\fR pointer; it might be used in this way: .PR extern char *alloc(); double *dp; dp = (double *) alloc(sizeof(double)); *dp = 22.0 / 7.0; .EP \fGalloc\fR must ensure (in a machine-dependent way) that its return value is suitable for conversion to a pointer to \fGdouble\fR; then the \fIuse\fR of the function is portable. .PP The pointer representation on the \*(pd corresponds to a 16-bit integer and is measured in bytes. \fGchars\fR have no alignment requirements; everything else must have an even address. .PP On the Honeywell 6000, a pointer corresponds to a 36-bit integer; the word part is in the left 18 bits, and the two bits that select the character in a word are just to their right. Thus \fGchar\fR pointers are measured in units of $2 sup 16$ bytes; everything else is measured in units of $2 sup 18$ machine words. \fGdouble\fR quantities and aggregates containing them must lie on an even word address (0 mod $2 sup 19$). .PP The IBM 370 and the \*I \*(I2 are similar. On both, addresses are measured in bytes and occupy a 32-bit integer; elementary objects must be aligned on a boundary equal to their length, so pointers to \fGshort\fR must be 0 mod 2, to \fGint\fR and \fGfloat\fR 0 mod 4, and to \fGdouble\fR 0 mod 8. Aggregates are aligned on the strictest boundary required by any of their constituents. .PP Addressing on the \*I \*(I1 is like the \*(I2, except that a pointer corresponds to a 16-bit integer, and \fGlong\fR objects need only be aligned on a 2-byte boundary. .mc .SH 15. Constant expressions .LP In several places C requires expressions which evaluate to a constant: after .Bd case, as array bounds, and in initializers. In the first two cases, the expression can involve only integer constants, character constants, and .Bd sizeof expressions, possibly connected by the binary operators .SY .R + \(mi \** / % & \(or \*^ << >> == != < > <= >= .ES or by the unary operators .SY \(mi \*~ .ES or by the ternary operator .SY .R ? : .ES Parentheses can be used for grouping, but not for function calls. .PP A bit more latitude is permitted for initializers; besides constant expressions as discussed above, one can also apply the unary \fG&\fR operator to external or static objects, .MC and (under UNIX) .mc to external or static arrays subscripted with a constant expression. The unary \fG&\fR can also be applied implicitly by appearance of unsubscripted arrays and functions. The basic rule is that initializers must evaluate either to a constant or to the address of a previously declared external or static object plus or minus a constant.