[TUHS] Maximum Array Sizes in 16 bit C

Larry McVoy lm at mcvoy.com
Sat Sep 21 01:30:44 AEST 2024


On Sat, Sep 21, 2024 at 01:07:11AM +1000, Dave Horsfall wrote:
> On Fri, 20 Sep 2024, Paul Winalski wrote:
> 
> > On Thu, Sep 19, 2024 at 7:52???PM Rich Salz <rich.salz at gmail.com> wrote:
> > 
> >       In my first C programming job I saw the source to V7 grep which
> >       had a "foo[-2]" construct.
> > 
> > That sort of thing is very dangerous with modern compilers.?? Does K&R C
> > require that variables be allocated in the order that they are declared??? If
> > not, you're playing with fire.?? To get decent performance out of modern
> > processors, the compiler must perform data placement to maximize cache
> > efficiency, and that practically guarantees that you can't rely on
> > out-of-bounds array references.
> 
> [...]
> 
> Unless I'm mistaken (quite possible at my age), the OP was referring to 
> that in C, pointers and arrays are pretty much the same thing i.e. 
> "foo[-2]" means "take the pointer 'foo' and go back two things" (whatever 
> a "thing" is).

Yes, but that was a stack variable.  Let me see if I can say it more clearly.


foo()
{
	int	a = 1, b = 2;
	int	alias[5];

	alias[-2] = 0;		// try and set a to 0.
}

In v7 days, the stack would look like

	[stuff]
	[2 bytes for a]
	[2 bytes for b]
	[2 bytes for the alias address, which I think points forward]
	[10 bytes for alias contents]

I'm hazy on how the space for alias[] is allocated, so I made that up.  It's
probably something like I said but Paul (or someone) will correct me.

When using a negative index for alias[], the coder is assuming that the stack
variables are placed in the order they were declared.  Paul tried to explain
that _might_ be true but is not always true.  Modern compilers will look see
which variables are used the most in the function, and place them next to
each other so that if you have the cache line for one heavily used variable,
the other one is right there next to it.  Like so:

	int	heavy1 = 1;
	int	rarely1 = 2;
	int	spacer[10];
	int	heavy2 = 3;
	int	rarel2 = 4;

The compiler might figure out that heavy{1,2} are used a lot and lay out the
stack like so:

	[2 bytes (or 4 or 8 these days) for heavy1]
	[bytes for heavy2]
	[bytes for rarely1]
	[bytes for spacer[10]]
	[bytes for rarely2]

Paul was saying that using a negative index in the array creates an alias,
or another name, for the scalar integer on the stack (his description made
me understand, for the first time in decades, why compiler writers hate
aliases and I get it now).  Aliases mess hard with optimizers.  Optimizers
may reorder the stack for better cache line usage and what you think
array[-2] means doesn't work any more unless the optimizer catches that 
you made an alias and preserves it.

Paul, how did I do?  I'm not a compiler guy, just had to learn enough to
walk the stack when the kernel panics.


More information about the TUHS mailing list