[TUHS] Maximum Array Sizes in 16 bit C
Rob Pike
robpike at gmail.com
Sat Sep 21 07:18:28 AEST 2024
Here is some code from typo.
int table[2]; /*keep these four cards in order*/
int tab1[26];
int tab2[730];
char tab3[19684];
...
er = read(salt,table,21200);
Note the use of the word 'card'.
The past is a different country.
-rob
On Sat, Sep 21, 2024 at 7:07 AM Warner Losh <imp at bsdimp.com> wrote:
>
>
> On Fri, Sep 20, 2024 at 9:16 PM Bakul Shah via TUHS <tuhs at tuhs.org> wrote:
>
>> You are a bit late with your screed. You will find posts
>> with similar sentiments starting back in 1980s in Usenet
>> groups such as comp.lang.{c,misc,pascal}.
>>
>> Perhaps a more interesting (but likely pointless) question
>> is what is the *least* that can be done to fix C's major
>> problems.
>>
>> Compilers can easily add bounds checking for the array[index]
>> construct but ptr[index] can not be checked, unless we make
>> a ptr a heavy weight object such as (address, start, limit).
>> One can see how code can be generated for code such as this:
>>
>> Foo x[count];
>> Foo* p = x + n; // or &x[n]
>>
>> Code such as "Foo *p = malloc(size);" would require the
>> compiler to know how malloc behaves to be able to compute
>> the limit. But for a user to write a similar function will
>> require some language extension.
>>
>> [Of course, if we did that, adding proper support for
>> multidimensional slices would be far easier. But that
>> is an exploration for another day!]
>>
>
> The CHERI architecture extensions do this. It pushes this info into
> hardware
> where all pointers point to a region (gross simplification) that also
> grant you
> rights the area (including read/write/execute). It's really cool, but it
> does come
> at a cost in performance. Each pointer is a pointer, and a capacity that's
> basically
> a cryptographically signed bit of data that's the bounds and access
> permissions
> associated with the pointer. There's more details on their web site:
> https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/
>
> CHERI-BSD is a FreeBSD variant that runs on both CHERI variants (aarch64
> and
> riscv64) and where most of the research has been done. There's also a
> Linux
> variant as well.
>
> Members of this project know way too many of the corner cases of the C
> language
> from porting most popular software to the CHERI... And have gone on
> screeds of
> their own. The only one I can easily find is
>
> https://people.freebsd.org/~brooks/talks/asiabsdcon2017-helloworld/helloworld.pdf
>
> Warner
>
>
>> Converting enums to behave like Pascal scalars would
>> likely break things. The question is, can such breakage
>> be fixed automatically (by source code conversion)?
>>
>> C's union type is used in two different ways: 1: similar
>> to a sum type, which can be done type safely and 2: to
>> cheat. The compiler should produce a warning when it can't
>> verify a typesafe use -- one can add "unsafe" or some such
>> to let the user absolve the compiler of such check.
>>
>> [May be naively] I tend to think one can evolve C this way
>> and fix a lot of code &/or make a lot of bugs more explicit.
>>
>> > On Sep 20, 2024, at 10:11 AM, G. Branden Robinson <
>> g.branden.robinson at gmail.com> wrote:
>> >
>> > At 2024-09-21T01:07:11+1000, Dave Horsfall wrote:
>> >> Unless I'm mistaken (quite possible at my age), the OP was referring
>> >> to that in C, pointers and arrays are pretty much the same thing i.e.
>> >> "foo[-2]" means "take the pointer 'foo' and go back two things"
>> >> (whatever a "thing" is).
>> >
>> > "in C, pointers and arrays are pretty much the same thing" is a common
>> > utterance but misleading, and in my opinion, better replaced with a
>> > different one.
>> >
>> > We should instead say something more like:
>> >
>> > In C, pointers and arrays have compatible dereference syntaxes.
>> >
>> > They do _not_ have compatible _declaration_ syntaxes.
>> >
>> > Chapter 4 of van der Linden's _Expert C Programming_: Deep C Secrets_
>> > (1994) tackles this issue head-on and at length.
>> >
>> > Here's the salient point.
>> >
>> > "Consider the case of an external declaration `extern char *p;` but a
>> > definition of `char p[10];`. When we retrieve the contents of `p[i]`
>> > using the extern, we get characters, but we treat it as a pointer.
>> > Interpreting ASCII characters as an address is garbage, and if you're
>> > lucky the program will coredump at that point. If you're not lucky it
>> > will corrupt something in your address space, causing a mysterious
>> > failure at some point later in the program."
>> >
>> >> C is just a high level assembly language;
>> >
>> > I disagree with this common claim too. Assembly languages correspond to
>> > well-defined machine models.[1] Those machine models have memory
>> > models. C has no memory model--deliberately, because that would have
>> > gotten in the way of performance. (In practice, C's machine model was
>> > and remains the PDP-11,[2] with aspects thereof progressively sanded off
>> > over the years in repeated efforts to salvage the language's reputation
>> > for portability.)
>> >
>> >> there is no such object as a "string" for example: it's just an "array
>> >> of char" with the last element being "\0" (viz: "strlen" vs. "sizeof".
>> >
>> > Yeah, it turns out we need a well-defined string type much more
>> > powerfully than, it seems, anyone at the Bell Labs CSRC appreciated.
>> > string.h was tacked on (by Nils-Peter Nelson, as I understand it) at the
>> > end of the 1970s and C aficionados have defended the language's
>> > purported perfection with such vigor that they annexed the haphazardly
>> > assembled standard library into the territory that they defend with much
>> > rhetorical violence and overstatement. From useless or redundant return
>> > values to const-carelessness to Schlemiel the Painter algorithms in
>> > implementations, it seems we've collectively made every mistake that
>> > could be made with Nelson's original, minimal API, and taught those
>> > mistakes as best practices in tutorials and classrooms. A sorry affair.
>> >
>> > So deep was this disdain for the string as a well-defined data type, and
>> > moreover one conceptually distinct from an array (or vector) of integral
>> > types that Stroustrup initially repeated the mistake in C++. People can
>> > easily roll their own, he seemed to have thought. Eventually he thought
>> > again, but C++ took so long to get standardized that by then, damage was
>> > done.
>> >
>> > "A string is just an array of `char`s, and a `char` is just a
>> > byte"--another hasty equivalence that surrendered a priceless hostage to
>> > fortune. This is the sort of fallacy indulged by people excessively
>> > wedded to machine language programming and who apply its perspective to
>> > every problem statement uncritically.
>> >
>> > Again and again, with signed vs. unsigned bytes, "wide" vs. "narrow"
>> > characters, and "base" vs. "combining" characters, the champions of the
>> > "portable assembly" paradigm charged like Lord Cardigan into the pike
>> > and musket lines of the character type as one might envision it in a
>> > machine register. (This insistence on visualizing register-level
>> > representations has prompted numerous other stupidities, like the use of
>> > an integral zero at the _language level_ to represent empty, null, or
>> > false literals for as many different data types as possible. "If it
>> > ends up as a zero in a register," the thinking appears to have gone, "it
>> > should look like a zero in the source code." Generations of code--and
>> > language--cowboys have screwed us all over repeatedly with this hasty
>> > equivalence.
>> >
>> > Type theorists have known better for decades. But type theory is (1)
>> > hard (it certainly is, to cowboys) and (2) has never enjoyed a trendy
>> > day in the sun (for which we may be grateful), which means that is
>> > seldom on the path one anticipates to a comfortable retirement from a
>> > Silicon Valley tech company (or several) on a private yacht.
>> >
>> > Why do I rant so splenetically about these issues? Because the result
>> > of such confusion is _bugs in programs_. You want something concrete?
>> > There it is. Data types protect you from screwing up. And the better
>> > your data types are, the more care you give to specifying what sorts of
>> > objects your program manipulates, the more thought you give to the
>> > invariants that must be maintained for your program to remain in a
>> > well-defined state, the fewer bugs you will have.
>> >
>> > But, nah, better to slap together a prototype, ship it, talk it up to
>> > the moon as your latest triumph while interviewing with a rival of the
>> > company you just delivered that prototype to, and look on in amusement
>> > when your brilliant achievement either proves disastrous in deployment
>> > or soaks up the waking hours of an entire team of your former colleagues
>> > cleaning up the steaming pile you voided from your rock star bowels.
>> >
>> > We've paid a heavy price for C's slow and seemingly deeply grudging
>> > embrace of the type concept. (The lack of controlled scope for
>> > enumeration constants is one example; the horrifyingly ill-conceived
>> > choice of "typedef" as a keyword indicating _type aliasing_ is another.)
>> > Kernighan did not help by trashing Pascal so hard in about 1980. He was
>> > dead right that Pascal needed, essentially, polymorphic subprograms in
>> > array types. Wirth not speccing the language to accommodate that back
>> > in 1973 or so was a sad mistake. But Pascal got a lot of other stuff
>> > right--stuff that the partisanship of C advocates refused to countenance
>> > such that they ended up celebrating C's flaws as features. No amount of
>> > Jonestown tea could quench their thirst. I suspect the truth was more
>> > that they didn't want to bother having to learn any other languages.
>> > (Or if they did, not any language that anyone else on their team at work
>> > had any facility with.) A rock star plays only one instrument, no?
>> > People didn't like it when Eddie Van Halen played keyboards instead of
>> > guitar on stage, so he stopped doing that. The less your coworkers
>> > understand your work, the more of a genius you must be.
>> >
>> > Now, where was I?
>> >
>> >> What's the length of "abc" vs. how many bytes are needed to store it?
>> >
>> > Even what is meant by "length" has several different correct answers!
>> > Quantity of code points in the sequence? Number of "grapheme clusters"
>> > a.k.a. "user-perceived characters" as Unicode puts it? Width as
>> > represented on the output device? On an ASCII device these usually had
>> > the same answer (control characters excepted). But even at the Bell
>> > Labs CSRC in the 1970s, thanks to troff, the staff knew that they didn't
>> > necessarily have to. (How wide is an em dash? How many bytes represent
>> > it, in the formatting language and in the output language?)
>> >
>> >> Giggle... In a device driver I wrote for V6, I used the expression
>> >>
>> >> "0123"[n]
>> >>
>> >> and the two programmers whom I thought were better than me had to ask
>> >> me what it did...
>> >>
>> >> -- Dave, brought up on PDP-11 Unix[*]
>> >
>> > I enjoy this application of that technique, courtesy of Alan Cox.
>> >
>> > fsck-fuzix: blow 90 bytes on a progress indicator
>> >
>> > static void progress(void)
>> > {
>> > static uint8_t progct;
>> > progct++;
>> > progct&=3;
>> > printf("%c\010", "-\\|/"[progct]);
>> > fflush(stdout);
>> > }
>> >
>> >> I still remember the days of BOS/PICK/etc, and I staked my career on
>> >> Unix.
>> >
>> > Not a bad choice. Your exposure to and recollection of other ways of
>> > doing things, I suspect, made you a more valuable contributor than those
>> > who mazed themselves with thoughts of "the Unix way" to the point that
>> > they never seriously considered any other.
>> >
>> > It's fine to prefer "the C way" or "the Unix way", if you can
>> > intelligibly define what that means as applied to the issue in dispute,
>> > and coherently defend it. Demonstrating an understanding of the
>> > alternatives, and being able to credibly explain why they are inferior
>> > approaches, is how to do advocacy correctly.
>> >
>> > But it is not the cowboy way. The rock star way.
>> >
>> > Regards,
>> > Branden
>> >
>> > [1] Unfortunately I must concede that this claim is less true than it
>> > used to be thanks to the relentless pursuit of trade-secret means of
>> > optimizing hardware performance. Assembly languages now correspond,
>> > particularly on x86, to a sort of macro language that imperfectly
>> > masks a massive amount of microarchitectural state that the
>> > implementors themselves don't completely understand, at least not in
>> > time to get the product to market. Hence the field day of
>> > speculative execution attacks and similar. It would not be fair to
>> > say that CPUs of old had _no_ microarchitectural state--the Z80, for
>> > example, had the not-completely-official `W` and `Z` registers--but
>> > they did have much less of it, and correspondingly less attack
>> > surface for screwing your programs. I do miss the days of
>> > deterministic cycle counts for instruction execution. But I know
>> > I'd be sad if all the caches on my workaday machine switched off.
>> >
>> > [2] https://queue.acm.org/detail.cfm?id=3212479
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240921/b89972f9/attachment-0001.htm>
More information about the TUHS
mailing list