X3J11 Pleasanton meeting summary

Mon Oct 8 22:09:09 AEST 1990

In article <1737:Oct803:02:5890 at kramden.acf.nyu.edu> brnstnd at kramden.acf.nyu.edu (Dan Bernstein) writes:
>In article <1990Oct3.184359.2348 at sq.sq.com> msb at sq.sq.com (Mark Brader) writes:
>> > 	int a[4][5];
>> > 	a[1][7] = 0;	/* undefined behavior */
>> > Dave Prosser (our Redactor) vigorously protested the above interpretation.
>> My opinion is that the protest was right and the ruling wrong.
>On what basis? If I declare char x[100][3], for example, the compiler
>might want to allocate an extra byte for each element of x. Isn't this
>allowed by the standard?

Okay, I guess there is some point in summarizing the X3J11 discussion about
this issue.

Let's first of all give names to specific relevant types:
	typedef int	cell;	/* could be any object type, not just "int" */
	typedef cell	row[5];
	typedef row	matrix[4];
Then the largest object involved is
	matrix	a;		/* same as int a[4][5]; */
There are other objects that can readily be identified here;
	a[1];			/* denotes a row object */
	a[1][4];		/* correctly denotes a cell object */

X3J11 seemed to agree that there are sufficient constraints in the
standard that one can assert the following:
	assert(sizeof a == 4*sizeof a[1] && sizeof a == 4*5*sizeof a[1][4]);
In other words, the size of an array element INCLUDES any padding necessary
for adjacent elements to abut cleanly, and there is no additional padding
included in an array object.  Thus, every implementation COULD choose to
give a[1][7] a well-defined meaning; alignment and padding are not issues.

The actual issue is, what really constitutes an array object.  Note that
in the declaration grammar, for example in 3.5.4.2, an array HAS only one
level of aggregation.  There is not officially any such thing as a "multi-
dimensional array" in C, only arrays of arrays.  (The description in such
terms in 3.3.2.1 Semantics should be considered informal, English, usage
of "multidimensional" for purposes of exposition, not the implicit
introduction of a technical language construct.  In fact, you have to take
that description as referring to the more precise notion of arrays of
arrays in order for the description to make any sense.)

3.3.2.1 (p.40) and 3.3.6 (p.48) state quite clearly, in the majority view
of X3J11, that subscripting an array in effect removes ONE level of a
multi-level aggregation.  The wording on p.48 is, for example:  "When an
expression that has integral type is added to or subtracted from a pointer,
the result has the type of the pointer operand.  If the pointer operand
points to an element of an array object, and the array is large enough,
the result points to an element offset from the original element such that
the difference of the subscripts ...  In other words, if the expression P
points to the i-th element of an array object, the expressions ... point
to, respectively, the i+n-th and i-nth elements of the array object,
provided they exist.  ...  If both the pointer operand and the result
point to elements of the same array object, ... the evaluation shall not
produce an overflow; otherwise, the behavior is undefined.  Unless both
the pointer operand and the result point to elements of the same array
object, ... the behavior is undefined if the result is used as an operand
of the unary * operator."  (Note that 3.3.2.1 in effect rewrites E1[E2] as
(*(E1+(E2))) (they are "identical"), so unary * is applied in our example.)

The above should prepare you to understand the committee's response:

	For an array of arrays, the permitted pointer arithmetic in
	Standard section 3.3.6 Semantics (p. 48, ll. 12-40) is to be
	understood by interpreting the use of the word "object" as
	denoting the specific object determined directly by the pointer's
	type and value, NOT other objects related to that one by
	contiguity. For example, the following code has undefined behavior:

		int a[4][5];
		a[1][7] = 0;	/* undefined */

	Some conforming implementations may chose to diagnose an "array
	bounds violation", while others may chose to interpret such
	attempted accesses successfully with the "obvious" extended
	semantics.

Note that even such a subterfuge as the following would not be strictly
conforming:

	void func( int *p ) { p[7] = 0; }
	void main( void ) { int a[4][5]; func( a[1] ); return 0; }

This isn't much of a practical problem, at least not in most code that
I recall having written, because most often multidimensional "matrices"
are actually allocated as 1-dimensional arrays of the desired "cell"
type, accessed later via a pointer to one of the cells (normally the
first), and p.48 supports that usage.  Indeed, as NCEG considers ways to
add more useful notions of arrays and subarrays to the C language, such
tight constraints on what are and are not permissible operations on such
objects may well prove to be essential, at least from the point of view
of implementors of C compilers on "vector" architectures.

What is missing in the standard that would be required for such punning
to be strictly conforming is some sort of guarantee that an array of
arrays of T is also in some contexts considered an array of T itself.
As it stands, the rigorous type structure shines through too plainly.
Some X3J11 members actually want precisely that, arguing that their
implementations warn about array bounds violations and that their
customers have indicated that they strongly desire that feature.