[TUHS] Minimum Array Sizes in 16 bit C (was Maximum)

Sun Sep 29 09:30:20 AEST 2024

On Sat, Sep 28, 2024, 5:05 PM Rob Pike <robpike at gmail.com> wrote:

> I wrote a letter to the ANSI C (1989) committee.
>
> Please allow malloc(0).
> Please allow zero-length arrays.
>
> I got two letters back, saying that malloc(0) is illegal because
> zero-length arrays are illegal, and the other vice versa.
>
> I fumed.
>

And now we have zero length arrays an UB malloc(0).

Warner

-rob
>
>
> On Sun, Sep 29, 2024 at 8:08 AM Douglas McIlroy <
> douglas.mcilroy at dartmouth.edu> wrote:
>
>> I have to concede Branden's "gotcha". Struct copying is definitely not
>> O(1).
>>
>> A real-life hazard of non-O(1) operations. Vic Vyssotsky put bzero (I
>> forget what Vic called it) into Fortran II. Sometime later, he found that a
>> percolation simulation was running annoyingly slowly. It took some study to
>> discover that the real inner loop of the program was not the percolation,
>> which touched only a small fraction of a 2D field. More time went into the
>> innocuous bzero initializer. The fix was to ADD code to the inner loop to
>> remember what entries had been touched, and initialize only those for the
>> next round of the simulation.
>>
>> > for a while, every instruction has [sic] an exactly predictable and
>> constant cycle
>> >  count. ...[Then] all of a sudden you had instructions with O(n) cycle
>> counts.
>>
>> O(n) cycle counts were nothing  new. In the 1950s we had the IBM 1620
>> with arbitrary-length arithmetic and the 709 with "convert" instructions
>> whose cycle count went up to 256.
>>
>> >> spinelessly buckled to allow malloc(0) to return 0, as some
>> >> implementations gratuitously did.
>>
>> > What was the alternative?  There was no such thing as an exception, and
>> > if a pointer was an int and an int was as wide as a machine address,
>> > there'd be no way to indicate failure in-band, either.
>>
>> What makes you think allocating zero space should fail? If the size n of
>> a set is determined at run time, why should one have to special-case its
>> space allocation when n=0?  Subsequent processing of the form for(i=0; i<n;
>> i++) {...} will handle it gracefully with no special code. Malloc should do
>> as it did in v7--return a non-null pointer different from any other active
>> malloc pointer, as Bakul stated. If worse comes to worst[1] this can be
>> done by padding up to the next feasible size. Regardless of how the pointer
>> is created, any access via it would of course be out of bounds and hence
>> wrong.
>>
>> > How does malloc(0) get this job done and what benefit does it bring?
>>
>> If I understand the "job" (about initializing structure members)
>> correctly, malloc(0) has no bearing on it. The benefit lies elsewhere.
>>
>> Apropos of tail calls, Rob Pike had a nice name for an explicit tail
>> call, "become". It's certainly reasonable, though, to make compilers
>> recognize tail calls implicitly.
>>
>> [1] Worse didn't come to worst in the original malloc. It attached
>> metadata to each block, so even blocks of size zero consumed some memory.
>>
>> Doug
>>
>> On Sat, Sep 28, 2024 at 1:59 PM Bakul Shah via TUHS <tuhs at tuhs.org>
>> wrote:
>>
>>> Just responding to random things that I noticed:
>>>
>>> You don't need special syntax for tail-call. It should be done
>>> transparently when a call is the last thing that gets executed. Special
>>> syntax will merely allow confused people to use it in the wrong place and
>>> get confused more.
>>>
>>> malloc(0) should return a unique ptr. So that "T* a = malloc(0); T* b =
>>> malloc(0); a != (T*)0 && a != b". Without this, malloc(0) acts differently
>>> from malloc(n) for n > 0.
>>>
>>> Note that except for arrays, function arguments & result are copied so
>>> copying a struct makes perfect sense. Passing arrays by reference may have
>>> been due to residual Fortran influence! [Just guessing] Also note: that one
>>> exception has been the cause of many problems.
>>>
>>> In any case you have not argued convincingly about why dynamic memory
>>> allocation should be in the language (runtime) :-) And adding that wouldn't
>>> have fixed any of the existing problems with the language.
>>>
>>> Bakul
>>>
>>> > On Sep 28, 2024, at 9:58 AM, G. Branden Robinson <
>>> g.branden.robinson at gmail.com> wrote:
>>> >
>>> > At 2024-09-28T09:34:14-0400, Douglas McIlroy wrote:
>>> >>> C's refusal to specify dynamic memory allocation in the language
>>> >>> runtime (as opposed to, eventually, the standard library)
>>> >>
>>> >> This complaint overlooks one tenet of C: every operation in what you
>>> >> call "language runtime" takes O(1) time. Dynamic memory allocation
>>> >> is not such an operation.
>>> >
>>> > A fair point.  Let me argue about it anyway.  ;-)
>>> >
>>> > I'd make three observations.  First, K&R did _not_ tout this in their
>>> > book presenting ANSI C.  I went back and checked the prefaces,
>>> > introduction, and the section presenting a simple malloc()/free()
>>> > implementation.  The tenet you claim for the language is not explicitly
>>> > articulated and, if I squint really hard, I can only barely perceive
>>> > (imagine?) it deeply between the lines in some of the prefatory
>>> material
>>> > to which K&R mostly confine their efforts to promote the language.  In
>>> > my view, a "tenet" is something more overt: the sort of thing U.S.
>>> > politicians try to get hung on the walls of every public school
>>> > classroom, like Henry Spencer's Ten Commandments of C[1] (which itself
>>> > does not mention this "core language has only O(1) features"
>>> principle).
>>> >
>>> > Second, in reviewing K&R I was reminded that structure copying is part
>>> > of the language.  ("There are no operations that manipulate an entire
>>> > array or string, although structures may be copied as a unit."[2])
>>> > Doesn't that break the tenet right there?
>>> >
>>> > Third, and following on from the foregoing, your point reminded me of
>>> my
>>> > youth programming non-pipelined machines with no caches.  You could set
>>> > your watch by (just about) any instruction in the set--and often did,
>>> > because we penurious microcomputer guys often lacked hardware real-time
>>> > clocks, too.  That is to say, for a while, every instruction has an
>>> > exactly predictable and constant cycle count.  (The _value_ of that
>>> > constant might depend on the addressing mode, because that would have
>>> > consequences on memory fetches, but the principle stood.)  When the Z80
>>> > extended the 8080's instruction set, they ate from Tree of Knowledge
>>> > with block-copy instructions like LDIR and LDDR, and all of a sudden
>>> you
>>> > had instructions with O(n) cycle counts.  But as a rule, programmers
>>> > seemed to welcome this instead of recognizing it as knowing sin,
>>> because
>>> > you generally knew worst-case how many bytes you'd be copying and take
>>> > that into account.  (The worst worst case was a mere 64kB!)
>>> >
>>> > Further, Z80 home computers in low-end configurations (that is, no disk
>>> > drives) often did a shocking thing: they ran with all interrupts
>>> masked.
>>> > All the time.  The one non-maskable interrupt was RESET, after which
>>> you
>>> > weren't going to be resuming execution of your program anyway.  Not
>>> from
>>> > the same instruction, at least.  As I recall the TRS-80 Model I/III/4
>>> > didn't even have logic on the motherboard to decode the Z80's
>>> "interrupt
>>> > mode 2", which was vectored, I think.  Even in the "high-end"
>>> > configurations of these tiny machines, you got a whopping ONE interrupt
>>> > to play with ("IM 1").
>>> >
>>> > Later, when the Hitachi 6309 smuggled similar block-transfer decadence
>>> > into its extensions to the Motorola 6809 (to the excitement of we
>>> > semi-consciously Unix-adjacent OS-9 users) they faced a starker
>>> problem,
>>> > because the 6809 didn't wall off interrupts in the same way the 8080
>>> and
>>> > Z80.  They therefore presented the programmer with the novelty of the
>>> > restartable instruction, and a new generation of programmers became
>>> > acquainted with the hard lessons time-sharing minicomputer people were
>>> > familiar with.
>>> >
>>> > My point in this digression is that, in my opinion, it's tough to hold
>>> > fast to the O(1) tenet you claim for C's core language and to another
>>> at
>>> > the same time: the language's identity as a "portable assembly
>>> > language".  Unless every programmer has control over the compiler--and
>>> > they don't--you can't predict when the compiler will emit an O(n) block
>>> > transfer instruction.  You'll just have to look at the disassembly.
>>> >
>>> > _Maybe_ you can retain purity by...never copying structs.  I don't
>>> think
>>> > lint or any other tool ever checked for this.  Your advocacy of this
>>> > tenet is the first time I've heard it presented.
>>> >
>>> > If you were to suggest to me that most of the time I've spent in my
>>> life
>>> > arguing with C advocates was with rotten exemplars of the species and
>>> > therefore was time wasted, I would concede the point.
>>> >
>>> > There's just so danged _many_ of them...
>>> >
>>> >> Your hobbyhorse awakened one of mine.
>>> >>
>>> >> malloc was in v7, before the C standard was written. The standard
>>> >> spinelessly buckled to allow malloc(0) to return 0, as some
>>> >> implementations gratuitously did.
>>> >
>>> > What was the alternative?  There was no such thing as an exception, and
>>> > if a pointer was an int and an int was as wide as a machine address,
>>> > there'd be no way to indicate failure in-band, either.
>>> >
>>> > If the choice was that or another instance of atoi()'s wincingly awful
>>> > "does this 0 represent an error or successful conversion of a zero
>>> > input?" land mine, ANSI might have made the right choice.
>>> >
>>> >> I can't imagine that any program ever actually wanted the feature. Now
>>> >> it's one more undefined behavior that lurks in thousands of programs.
>>> >
>>> > Hoare admitted to only one billion-dollar mistake.  No one dares count
>>> > how many to write in C's ledger.  This was predicted, wasn't it?
>>> > Everyone loved C because it was fast: it was performant, because it
>>> > never met a runtime check it didn't eschew--recall again Kernighan
>>> > punking Pascal on this exact point--and it was quick for the programmer
>>> > to write because it never met a _compile_-time check it didn't eschew.
>>> > C was born as a language for wizards who never made mistakes.
>>> >
>>> > The problem is that, like James Madison's fictive government of angels,
>>> > such entities don't exist.  The staff of the CSRC itself may have been
>>> > overwhelmingly populated with frank, modest, and self-deprecating
>>> > people--and I'll emphasize here that I'm aware of no accounts that this
>>> > is anything but true--but C unfortunately played a part in stoking a
>>> > culture of pretension among software developers.  "C is a language in
>>> > which wizards program.  I program in C.  Therefore I'm a wizard." is
>>> how
>>> > the syllogism (spot the fallacy) went.  I don't know who does more
>>> > damage--the people who believe their own BS, or the ones who know
>>> > they're scamming their colleagues.
>>> >
>>> >> There are two arguments for malloc(0), Most importantly, it caters for
>>> >> a limiting case for aggregates generated at runtime--an instance of
>>> >> Kernighan's Law, "Do nothing gracefully". It also provides a way to
>>> >> create a distinctive pointer to impart some meta-information, e.g.
>>> >> "TBD" or "end of subgroup", distinct from the null pointer, which
>>> >> merely denotes absence.
>>> >
>>> > I think I might be confused now.  I've frequently seen arrays of
>>> structs
>>> > initialized from lists of literals ending in a series of "NULL"
>>> > structure members, in code that antedates or ignores C99's wonderful
>>> > feature of designated initializers for aggregate types.[3]  How does
>>> > malloc(0) get this job done and what benefit does it bring?
>>> >
>>> > Last time I posted to TUHS I mentioned a proposal for explicit
>>> tail-call
>>> > elimination in C.  I got the syntax wrong.  The proposal was "return
>>> > goto;".  The WG14 document number is N2920 and it's by Alex Gilding.
>>> > Good stuff.[4]  I hope we see it in C2y.
>>> >
>>> > Predictably, I must confess that I didn't make much headway on
>>> > Schiller's 1975 "secure kernel" paper.  Maybe next time.
>>> >
>>> > Regards,
>>> > Branden
>>> >
>>> > [1] https://web.cs.dal.ca/~jamie/UWO/C/the10fromHenryS.html
>>> >
>>> >    I can easily imagine that the tenet held at _some_ point in the
>>> >    C's history.  It's _got_ to be the reason that the language
>>> >    relegates memset() and memcpy() to the standard library (or to the
>>> >    programmer's own devise)!  :-O
>>> >
>>> > [2] Kernighan & Ritchie, _The C Programming Language_, 2nd edition, p.
>>> 2
>>> >
>>> >    Having thus admitted the camel's nose to the tent, K&R would have
>>> >    done the world a major service by making memset(), or at least
>>> >    bzero(), a language feature, the latter perhaps by having "= 0"
>>> >    validly apply to an lvalue of non-primitive type.  Okay,
>>> >    _potentially_ a major service.  You'd still need the self-regarding
>>> >    wizard programmers to bother coding it, which they wouldn't in many
>>> >    cases "because speed".  Move fast, break stuff.
>>> >
>>> >    C++ screwed this up too, and stubbornly stuck by it for a long time.
>>> >
>>> >    https://cplusplus.github.io/CWG/issues/178.html
>>> >
>>> > [3] https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html
>>> > [4] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2920.pdf
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240928/87b8b17b/attachment.htm>