Ronald Natalie wrote in
<5DB09C5A-F5DA-4375-AAA5-0711FC6FB1D9(a)ronnatalie.com>:
|> On May 15, 2020, at 7:34 PM, Steffen Nurpmeso <steffen(a)sdaoden.eu> wrote:
|> ron(a)ronnatalie.com wrote in
|> <077a01d62b08$e696bee0$b3c43ca0$(a)ronnatalie.com>:
|>|Char is different. One of the silly foibles of C. char can be \
|>|signed or
|>|unsigned at the implementation's decision.
|>
|> And i would wish Thompson and Pike would have felt the need to
|> design UTF-8 ten years earlier. Maybe we would have a halfway
|> usable "wide" character interface in the standard (C) library.
|The issue is making char play double duty as a basic storage unit and \
|a native character.
|This means you can never have 16 (or 32 bit) chars on any machine that \
|you wanted to support 8 bit integers.
Oh, I am not the person to step in here.
[I deleted 60+ lines of char*/void*, and typedefs,
etc. experiences i had. And POSIX specifying that a byte has
8-bit. And soon that NULL/(void*)0 has all bits 0.]
Unicode / ISO 10646 did not exist by then. sure.
I am undecided. I was a real fan of UTF-32 (32-bit character)
at times, but when i looked more deeply in Unicode, it turned
out to be false thinking: some languages are so complex that you
need to address entire sentences, or at least encapsulate
"graphem" boundaries, going for "codepoints" is just wrong.
Then i thought Microsoft and their UTF-16 decision was not that
bad, because almost all real life characters of Unicode can
nonetheless be addressed by a single 16-bit codepoint, and that
eases programming. But moreover UTF-8 needs three bytes for
most of them.
Why did it happen? Why was the char type overloaded like this?
Why was there no byte or "mem" type? It is to this day, i think,
that ISO C allows to bypass their (terrible) aliasing rules by
casting to and from char*.
In v5 usr/src/s2/mail.c i see
+getfield(buf)
+char buf[];
+{
+ int j;
+ char c;
+
+ j = 0;
+ while((c = buf[j] = getc(iobuf)) >= 0)
+ if(c==':' || c=='\n') {
+ buf[j] =0;
+ return(1);
+ } else
+ j++;
+ return(0);
+}
so here the EOF was different and char was signed 7-bit it seems.
At that time at latest i have to admit that i have not looked in
old source code for years. But just had a quick look in the dmr/
of 5th revision, and there you see "char lowbyte", for example.
A nice Sunday from Germany! i wish you, and the list,
--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)