[TUHS] Maximum Array Sizes in 16 bit C

G. Branden Robinson g.branden.robinson at gmail.com
Sat Sep 21 08:19:12 AEST 2024


Hi Bakul & Warner,

At 2024-09-20T13:16:24-0700, Bakul Shah wrote:
> You are a bit late with your screed.

...I had hoped that my awareness of that was made evident by my citation
of a 30-year-old book.  ;-)

> You will find posts with similar sentiments starting back in 1980s in
> Usenet groups such as comp.lang.{c,misc,pascal}.

Before my time, but I don't doubt it.  The sad thing is that not enough
people took such posts seriously.

I spend a fair amount of time dealing with "legacy" code.  Stuff that
hasn't been touched in a long time.  One thing I'm convinced of: bad
idioms are forever.  And that means people will keep learning and
copying them.

Of course no one wants to pay for the cleanup of such technical debt,
not in spite of but _because_ it will expose bugs.  You can't justify to
any manager that we need to set up this one cost center so that we can
expand another one.

Not unless the manager cares about downside risk.  And tech culture
absolutely does not.  Let the planes fall out of the sky and the
reactors melt down.  You can justify it all in the name of "ethical
altruism", or whatever the trendy label for sociopathic anarcho-
capitalism is these days.

(I'm kidding, of course.  Serious tech bros understand the essential
function of government in maintaining structures for the allocation of
economic rents [copyrights and patents] and the utility of employment
law, police, and if it comes to it, the National Guard in the
suppression of organized labor.  Fortunately for management, software
engineers think so highly of themselves that they identify with the
billionaire CEO's economic class instead of their own.)

> Perhaps a more interesting (but likely pointless) question is what is
> the *least* that can be done to fix C's major problems.

Not pointless.  If we ask ourselves that question after every revision
of the language standard, the language _will_ advance.  C23 has a
`nullptr` constant.  K&R-style function declarations are gone, and good
riddance.  I did notice that some national bodies fought like hell to
keep trigraphs, though.  :-|

> Compilers can easily add bounds checking for the array[index]

Pascal expected this.  One of Kernighan's complaints in his CSTR #100
paper (the one I mentioned) is that he feared precious machine cycles
would be lost validating expressions that pointed within valid bounds.

So why not a compiler switch, jeez louise?  Develop in paranoid/slow
mode and ship in sloppy/fast mode.  If you must.

It seems that static analysis was in its infancy back then.  Compiler
writers screeched like banshees at the forms of validation the Ada
language spec required them to do, and complained so vociferously that
they helped trash the language's reputation.  A few years went by and,
gosh, folks realized that you sure could prevent a lot of bugs by wiring
such checks into compilers for other languages--in the places where the
semantics would permit it, a count that was invariably lower than Ada's
because, shock, Ada was actually thought out and went through several
revisions _before_ being put into production.

Did anyone ever repent of their loathsome shrieking?  Doubt it.  Static
analysis became cool and they accepted whatever plaudits fell upon them.

> construct but ptr[index] can not be checked, unless we make
> a ptr a heavy weight object such as (address, start, limit).

Base and bounds registers are an old idea.  Older than C.  But the
PDP-11 didn't have them,[1] so C expected to do without and the rest is
lamentable history.

We would do well to learn from C++'s multiple attempts at "smart
pointers".  I guess they've got it right in C++11, at last?  Not sure.

C++'s aggressive promiscuity has not done C a favor, but rather
conditioned the latter into reflexive, instead of reasoned,
conservatism.

> One can see how code can be generated for code such as this:
> 
> 	Foo x[count];
> 	Foo* p = x + n; // or &x[n]
> 
> Code such as "Foo *p = malloc(size);" would require the compiler to
> know how malloc behaves to be able to compute the limit.

C's refusal to specify dynamic memory allocation in the language runtime
(as opposed to, eventually, the standard library) was a painful
oversight.  There was a strange tension between that and code idioms
among C's own earliest practitioners to assume dynamically sized
storage.  I remember when novice C programmers managing strings would
get ridiculed by their seniors for setting up and writing to static
buffers.  Why did they do that?  Because it was easy--the language
supported it well.  Going to `malloc()` was like aiming a gun at your
own face.

The routine practice of memory overcommit in C on Unix systems led to a
sort of perverse synergy.  Programmers were actively conditioned
_against_ performing algorithmic analysis of their _space_ requirements.
(By contrast, seeing how far you could bro down your code's _time_
complexity was where you really showed your mettle.  If you spent all of
the time you saved waiting on I/O, hey man, that's not YOUR problem.)

> But for a user to write a similar function will require some language
> extension.
> 
> [Of course, if we did that, adding proper support for multidimensional
> slices would be far easier. But that is an exploration for another
> day!]

When I read about Fortran 90/95/2003's facilities for array reshaping, I
rocked back on my heels.

> Converting enums to behave like Pascal scalars would likely break
> things. The question is, can such breakage be fixed automatically (by
> source code conversion)?

I don't assert that C needs to ape _Pascal_ scalars in particular.
Better Ada's.  :P  Or, equivalently, C++11's "enum class".  As with many
things in C++, the syntax is an ugly graft, but the idea is as sound as
they come.

One of the proposals that didn't make it for C23 was similarly ugly:
"return break;".  But the _idea_ was to mark tail recursion so that the
compiler would know it's happening.  That saves stack.  _That's_ worth
having.  I worry that it didn't make it just because the syntax was so
cringey.  But the alternatives, like yet another new keyword, or
overloading punctuation some more, seemed worse.  C++ indulges both
vices amply with every revision.

> C's union type is used in two different ways: 1: similar to a sum
> type, which can be done type safely and 2: to cheat. The compiler
> should produce a warning when it can't verify a typesafe use -- one
> can add "unsafe" or some such to let the user absolve the compiler of
> such check.

Agreed.  C++'s family of typecasting operators is, once again, an ugly
feature syntactically, but the benefits in terms of saying what you
mean, and _only_ what you mean, are valuable.

Casts in C are too often an express ticket to UB.

> [May be naively] I tend to think one can evolve C this way and fix a
> lot of code &/or make a lot of bugs more explicit.

If that be naïveté, let's have more of it.

At 2024-09-20T21:58:26+0100, Warner Losh wrote:
> The CHERI architecture extensions do this. It pushes this info into
> hardware where all pointers point to a region (gross simplification)
> that also grant you rights the area (including read/write/execute).
> It's really cool, but it does come at a cost in performance. Each
> pointer is a pointer, and a capacity that's basically a
> cryptographically signed bit of data that's the bounds and access
> permissions associated with the pointer. There's more details on their
> web site:
> https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/

CHERI is absolutely cool and even if it doesn't conquer the world, I
feel sure that there is a lot we can learn from it.

> CHERI-BSD is a FreeBSD variant that runs on both CHERI variants
> (aarch64 and riscv64) and where most of the research has been done.
> There's also a Linux variant as well.
> 
> Members of this project know way too many of the corner cases of the C
> language from porting most popular software to the CHERI...  And have
> gone on screeds of their own. The only one I can easily find is
> https://people.freebsd.org/~brooks/talks/asiabsdcon2017-helloworld/helloworld.pdf

Oh yes.  I remember they presented at the LF's Open Source Summit one
year (maybe the last year in was in downtown San Francisco, before the
LF moved the conference to wine country to scrape off all the engineers
and other tedious techy types who might point out what's wrong with
somebody's grandiose sales pitch--conferences are for getting deals
done [too many vice cops in SF?], not advancing the state of the art).

It was a questionnaire along the lines of "what do you _really_ know
about C?" and it opened my eyes wide for sure.

Apparently it turns out that the Dunning-Kruger effect isn't what we
think it is.

https://www.scientificamerican.com/article/the-dunning-kruger-effect-isnt-what-you-think-it-is/

Maybe D&K's findings were so rapidly assimilated into the cultural
zeitgeist because far too many people are acquainted with highly
confident C programmers.

While preparing this message, I ran across this:

https://csrc.nist.gov/files/pubs/conference/1998/10/08/proceedings-of-the-21st-nissc-1998/final/docs/early-cs-papers/schi75.pdf

"The Design and Specification of a Security Kernel for the PDP-11/45",
by Schiller (1975).

I'll try to read and absorb its 117 pages before burdening this list
with any more of my yammerings.  Happy weekend!

Regards,
Branden

[1]  I think.  The PDP-11/20 infamously didn't have memory protection of
     any sort, and the CSRC wisely ran away from that as fast as they
     could once they could afford to.  (See the preface to the Third
     Edition Programmer's Manual.)  And it was reasonable to not expect
     support for such things if one wanted portability to embedded
     systems, but it's not clear to me how seriously the portability of
     C itself was considered until the first ports were actually _done_,
     and these were not to embedded systems, but to machines broadly
     comparable to PDP-11s.  London and Reiser's paper on Unix/32V
     opened my eyes with respect to just how late some portability-
     impacting changes to "K&R C" were actually made.  They sounded many
     cautionary notes that the community--or maybe it was just compiler
     writers (banshees again?)--seemed slow to acknowledge.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240920/b6a86248/attachment.sig>


More information about the TUHS mailing list