SysIII/usr/src/man/docs/c_add1

.ds :? Recent Changes to C
.PH "''''"
.OH "'\s9\f2\*(:?\fP''\\\\nP\s0'"
.EH "'\s9\\\\nP''\f2\*(:?\^\fP\s0'"
.de Ds
.DS I
.ss 18
\!.ss 18
.lg 0
..
.de De
.lg
\!.ss 12
.ss 12
.DE
..
.tr ~
.TL
Recent Changes to C
.AU "B. R. Rowland" BRR IH
.AS
The C programming language is currently in widespread use
across Bell Laboratories.
It is the primary programming language (along with FORTRAN 77)
on computers using the UNIX\(dg
.FS \(dg
UNIX is a Trademark of Bell Laboratories.
.FE
operating system
and is available on many other general purpose computers such as
the IBM System/370 with TSS and OS,
the Intel 8086, and
Honeywell HIS-6080 with GCOS.
C is implemented for several of the processors produced
by the Bell System including
MAC-8 and 3B-20.
C is a language with a flexible variety of both control
and data structures as well as low level data access
primitives.
Recently C has evolved to meet new Bell Laboratory needs.
.P
This paper describes recent enhancements to the C language that are not
currently documented.
These include:
.DL "" compact
.LI
structure assignment
.LI
structure valued functions
.LI
structure valued parameters
.LI
enumerations
.LI
non-unique structure and union members
.LI
fully qualified structure and union references
.LE
.P
Examples of all the above are given.
.AE
.MT 4
.H 1 "INTRODUCTION"
The C programming language has been successfully used
in systems programming as well as general purpose
programming environments across Bell Laboratories
on a wide variety of computers and stored program processors.
Among these machines are the PDP-11 series, IBM/360-370 series,
VAX 11/780, UNIVAC 1100 series, Honeywell HIS-6080, Interdata 832,
Intel 8086, 3B Model 20, and MAC-8.
This success stems partly from its flexible control and
data structuring and low level data accessing primitives
but is also due to a large measure because
C compilers have proved relatively easy to port to
new machines and the C language is fairly simple to learn.
.P
The definition of the C language has evolved as
C is becoming more widely used in Bell Laboratories.
In response to changing needs and requirements
of C programmers (as summarized in [Row~79b]),
a few extensions have been made to the C language
beyond what is described in the reference document
[Ker~77].
The extensions consist primarily of changes in the use of
structures and unions,
and the introduction of a new data type, enumeration.
Along with descriptions of the syntactic and semantic changes
to the C language, examples of the new features are
illustrated.
.P
As the C language changes, the compilers that implement
the language have been tracking the changes.
For the most part this involves the UNIX* system
.FS *
UNIX is a trademark of Bell Laboratories.
.FE
C compiler for the PDP-11 maintained by D. M. Ritchie
and the Portable C Compiler [Joh~78] and Lint [Joh~77]
developed by S. C. Johnson
for which design control has been transferred to Department 3621.
The changes described in this memorandum have
been incorporated into these compilers.
Other existing C compilers should follow suit
by picking up the changes from the C compiler
each is based upon.
.H 1 "NEW FEATURES"
.H 2 "Structure assignment"
Structure assignment has been added to the C language
to simplify both the source and object code associated
with transferring the value of one structure instance to
another and to allow functions to return aggregate
values when invoked.
Since many processors now contain some type
of `move block' instruction, structure assignment
will permit more efficient use of many machines.
It also makes source programs more readable.
.P
Structures may be assigned, passed as arguments to functions,
and returned by functions.
The types of structure operands taking part must be the same.
Other plausible operators, such as equality comparison
and structure casts,
are not being implemented due
to the difficulties associated with "holes"
in structures caused by alignment restrictions.
.P
The following code demonstrates
the new structure assignment features.
.Ds
	struct clock{
		int hour, minute, second;
		};
	struct date{
		int year, month, day;
		struct clock time;
		};
	struct clock now={13,2,36};
	extern struct date spring();
	struct date today, tomorrow;

	struct date nextday( day ) struct date day;{
		struct date tempday;
		...
		return tempday;
		}

	main(){
		today = spring();
		tomorrow = nextday( today );
		tomorrow.time = now;
		...
		}
.De
There is a subtle defect in the
PDP-11 and VAX 11/780
implementations
of functions that return structures:
if an interrupt occurs during the return sequence,
and the same function is called reentrantly
during the interrupt,
the value returned from the first call
may be corrupted.
The problem can occur only in the presence of
true interrupts,
as in an operating system or a user program that makes
significant use of signals; ordinary recursive calls are quite safe.
This same defect is not present in the Basic-16 [Hei~79],
IBM 370 [Row~79a] or 3B C
compiler [Mit~78] implementations.
.H 2 "Enumeration Type"
There is a new C data type analogous to the scalar types of Pascal [Wir~71].
Enumerations are unique types with named constants.
They serve to replace in part the use of #\f2define\^\fPd
constants in C, but they offer the additional advantage
of scoped constant names and strong type checking
in the use of such names.
.P
To the type-specifiers in the syntax on p. 193 of the
C book add
.Ds
		\f2enum-specifier
.De
with syntax
.Ds
	\f2enum-specifier:\^\fP
		\^\fPenum { \f2enum-list \fP}
		\^\fPenum \f2identifier \fP{ \f2enum-list \fP}
		\^\fPenum \f2identifier \fP
.De
.Ds
	\f2enum-list:\^\fP
		\f2enumerator\^\fP
		\f2enum-list \^\fP, \f2enumerator\fP

	\f2enumerator:\^\fP
		\f2identifier\^\fP
		\f2identifier \^\fP=\f2 constant-expression\fP
.De
The role of the identifier in the enum-specifier
is entirely analogous to that of the structure tag
in a struct-specifier; it names a particular enumeration.
For example,
.Ds
	enum color { chartreuse, burgundy, claret, winedark };
	...
	enum color *cp, col;
	...
	col = claret;
	cp = & col;
	...
	if( *cp == burgundy )...
.De
makes `color'
the enumeration-tag of a type describing various colors,
and then declares `cp'
as a pointer to an object of that type,
and `col'
as an object of that type.
.P
The identifiers in the enum-list are declared as constants,
and may appear wherever constants are required.
If no enumerators with "="
appear, then the values of the constants begin at 0
and increase by 1 as the declaration is read from left to right.
An enumerator with "="
gives the associated identifier the value
indicated;
subsequent identifiers continue the progression
from the assigned value.
.Ds
	enum interrupt{
		halt = 0,
		bad_instr = 01001,
		mem_fault,
		div_zero = 02001,
		overflow,
		underflow
		} icode;
	...
	if( (int)icode & 02000 )/* arithmetic fault */
	...
.De
.P
The previous example illustrates specific enumeration
value specification.
In particular, the symbol `overflow' has internal value 02002.
.P
Enumeration constants must all be distinct,
and, unlike structure members,
are drawn from the same set as ordinary identifiers.
.P
Objects of a given enumeration type are regarded as having
a type distinct from objects of all other types,
and
Lint
flags type mismatches.
In the
PDP-11 
implementation, all enumeration variables
are treated as if they were
.I int .
Portable C Compiler implementations map enumerations into
a convenient storage unit (\f2char, short\^\fP, or \f2int\fP)
depending on the values associated with the enumeration constants.
.H 2 "Non-Unique Structure Member Names"
The C language has been changed in a nearly upwards compatible
fashion to allow more flexibility in the reuse of structure member
and structure field names.
The obscure case in which upwards compatibility is not maintained
is explained in detail at the end of this section.
This enhancement permits more natural structure and union
member naming conventions in C programs and results
in stronger type checking of both structure and union member
references.
.H 3 "Former Member Name Restrictions."
Prior to this change,
there were only two ways in which structure member names could
be reused.
.AL A
.LI
Member names of two distinct structures declared at any block levels
(including different block levels)
that represented the same member type and offset could be identical.
For example, the name `xyz' is used in both of the following two
structures:
.Ds
	struct s1{
		long abc;
		char xyz;
		float def;
		};
	struct s2{
		long abc;
		char xyz;
		short jkl;
		};
.De
.P
With such a construction, the structure member name `xyz'
could be referenced from any structure variable or any
pointer without ambiguity.
.LI
Member names could be reused within a new name scoping (block) level.
In the following code section,
the member name `f_one' is reused:
.Ds
	struct outer{
		int f_zero:2,
		    f_one:4,
		    f_two:10;
		struct outer *next;
		};

	function(){
		struct inner{
			int f_one, g_one, h_one;
			};
		...
		}
.De
.P
When member names are redeclared at different block
levels, the innermost declaration serves to block
the outer declarations of the same name within the
inner scope.
In the previous example,
the four-bit field `f_one' could \f2not\^\fP be referenced
(even from structures that are explicitly declared to be type
`outer') within the function `function'.
.LE
.H 3 "New Flexibility for Member Names."
The language change for structure member names allows
the reuse or redeclaration of structure member or field
names with only a single restriction:
.AL 1
.LI
A particular name may not be used for two distinct
members within the same structure.
(However, a name may be reused within nested structures.)
.LE
.P
The impact of this change is stronger type checking
for structures and unions.
Call a structure (or union) member unique if it is declared
only once, or if all its declarations conform to the requirements
of case A above.
If a uniquely-named member is mentioned in a structure
reference in which it is not a member of the structure,
a warning diagnostic is issued.
This allows old C programs that violate the language
rules to continue to compile.
However, if a member that is not uniquely named
is used in a structure reference in which it is
not a member of the structure, a fatal diagnostic
is issued.
.P
The case in which upwards compatibility is not maintained
involves structure member name redeclarations of type (B)
described above.
.Ds
	struct x{
		int a,b;
		} x_obj;

	main(){
		int *ip;
		struct {
			int b,a;
			} y_obj;

		... ip->a ...
		... y_obj.a ...
		... x_obj.a ...
		}
.De
.P
In the example above, prior to the language change,
each of the references `ip->a', `y_obj.a', and `x_obj.a'
were considered legitimate, and an offset of two bytes
(on a sixteen-bit processor, such as the PDP-11)
for the integer referenced by `a' was used.
With non-unique structure members,
the integer referenced by `a' in `x_obj.a' would have
an offset of zero bytes from the address of `x_obj'.
The reference `ip->a' could either be considered a user
error by a particular compiler or a warning could be issued
and the innermost declaration of `a' could be used to resolve
the reference.
Because of the lack of existing code with such potential
ambiguities for most PCC compiler instances, a
fatal diagnostic will be issued by the PCC for `ip->a'.
.H 2 "Complete Structure/Union Member Reference Qualifications"
In past C compiler instances, a reference to a structure or
a union member could be abbreviated in some cases.
A structure or union member reference is a chain of member
references (qualifications) that are prefixed by either a
pointer to a structure or union
or a structure or union proper.
Since each qualification implies the addition of an offset
within an address computation, it was possible in the past to
omit those qualifications that had an offset of zero.
Zero offsets occur in the first member of a structure and
in all members of unions.
With the two following declarations:
.Ds
	struct xx{
		struct yy{
			int y1; char y2;
			} ym;
		...
		} *xp;

	union u{
		struct a{
			int a1,a2,a3;
			} mema;
		struct b{
			char b1,b2,b3;
			} memb;
		} *up;

.De
the following references were allowed:
.Ds
	xp->y2	    /* same as */	xp->ym.y2

	up->b2	    /* same as */	up->memb.b2
.De
.P
Due to the ambiguities that can arise with incomplete
qualifications and non-unique structure and union member
names, \f2complete qualifications\^\fP are now required
for structure and union member references in the C
language.
This change also serves to enforce stronger type checking
of structure and structure pointer use within C.
At the present time, incomplete qualifications will be
flagged with user warning messages.
Lint, run in its heuristic mode, will suggest how to complete
an incomplete qualification.
Union members that are structures must be named, so that
complete qualifications can be constructed.
.P
Of the references in the previous example, only the following
structure and union member references are now legitimate:
.Ds
	xp->ym.y2

	up->memb.b2
.De
.P
.H 2 "Tag Names"
Structure, union, and enumeration tag names are the
names associated with a declared type and always appear
after the keywords
\f2struct, union\^\fP, and \f2enum\fP,
as in the following examples:
.Ds
	typedef enum bool {false, true} bool;
	struct list *head;
	union cell {unsigned word; char byte[2];};
.De
.P
Previous implementations of C required that all
structure and union tag names be distinct from
structure and union member names.
This restriction has been removed from the C
language.
As a result, four name pools now exist:
.AL I
.LI
# \f2define\^\fPd macro names
.br
(Processed separately by /lib/cpp.)
.LI
structure, union, and enumeration tag names
.LI
structure and union members
.br
(These may be non-unique.)
.LI
all other names
.br
(Includes: typedef names; array, structure instance,
and variable names; and enumeration constant names.)
.LE
.H 2 "Vertical Tab Character Literal"
A new character literal has been added to the C language.
The vertical tab character (VT, octal 013 in ASCII and EBCDIC)
can now be represented as '\ev' in addition to '\e013'.
This character can also be used within character string literals
(eg.: "Upper~left\et\et\et\ev\evLower~right\en").
Vertical tab is now included in the definition of \f2white space\^\fP
and thus can be used to delimit tokens in a C source file.
.H 1 "SUMMARY"
A significant number of changes to the C language
have occurred since the last release of the C
reference manual [Ker~77].
The changes affect mainly the use of structures and
unions and the naming restrictions in the language.
Cooperative efforts among C compiler and Lint
implementors are leading to coordinated releases
of these compilation tools with new language features.
.H 1 "ACKNOWLEDGEMENTS"
The language changes described in this memorandum
are a result of language design work performed by
Dennis Ritchie and Steve Johnson.
A nontrivial effort was involved in
reviewing a host of requested language enhancements
and selecting and refining those that were compatible
with the nature of the C language, implementable in
existing tools, and compatible with nearly all existing
code written in C.
.HU "REFERENCES"
.VL 10
.LI "[Hei~79]"
W. C. Heiny.
.I "Basic 16 C Compiler Implementation,\^"
Internal Report,
Bell Laboratories (May 1979).
.LI [Joh~77]
S. C. Johnson.
.I "Lint, a C Program Checker,\^"
Bell Laboratories (May 1979).
.LI "[Joh~78]"
S. C. Johnson.
A Portable Compiler: Theory and Practice,
.I "Conference Record of the Fifth Annual ACM Conference on"
.I "Principles of Programming Languages,\^"
pp.\ 97-104,
Tucson, AZ (Jan. 23, 1978).
.LI "[Ker~78]"
B. W. Kernighan and D. M. Ritchie.
.I "The C Programming Language,\^"
Prentice-Hall, Englewood Cliffs, NJ
(1978).
.LI "[Mit~79]"
R. W. Mitze.
.I "An Overview of C Compilation of UNIX User Processes on the 3B,\^"
Internal Report,
Bell Laboratories (Mar. 1979).
.LI "[Row~79a]"
B. R. Rowland.
.I "Status Report for IBM 370 C Compiler and its Indian Hill
.I "TSS Implementation,\^"
Internal Report, Bell Laboratories (Feb. 1979).
.LI "[Row~79b]"
B. R. Rowland.
.I "C Language Enhancements: Laboratory 252 Recommendations,\^"
Internal Report, Bell Laboratories (Mar. 1979).
.LI "[Wir~71]"
N. Wirth.
The Programming Language PASCAL,
.I "Acta Informatica\^"
.BR 1 (1):35-63.
.LE
.sp 1v
.I "May 1979"