[TUHS] v7 K&R C

Steffen Nurpmeso steffen at sdaoden.eu
Sun May 17 09:26:07 AEST 2020


Ronald Natalie wrote in
<5DB09C5A-F5DA-4375-AAA5-0711FC6FB1D9 at ronnatalie.com>:
 |> On May 15, 2020, at 7:34 PM, Steffen Nurpmeso <steffen at sdaoden.eu> wrote:
 |> ron at ronnatalie.com wrote in
 |> <077a01d62b08$e696bee0$b3c43ca0$@ronnatalie.com>:
 |>|Char is different.  One of the silly foibles of C.    char can be \
 |>|signed or
 |>|unsigned at the implementation's decision.
 |> 
 |> And i would wish Thompson and Pike would have felt the need to
 |> design UTF-8 ten years earlier.  Maybe we would have a halfway
 |> usable "wide" character interface in the standard (C) library.

 |The issue is making char play double duty as a basic storage unit and \
 |a native character.
 |This means you can never have 16 (or 32 bit) chars on any machine that \
 |you wanted to support 8 bit integers.

Oh, I am not the person to step in here.
  [I deleted 60+ lines of char*/void*, and typedefs,
  etc. experiences i had.  And POSIX specifying that a byte has
  8-bit.  And soon that NULL/(void*)0 has all bits 0.]

  Unicode / ISO 10646 did not exist by then. sure.

  I am undecided.  I was a real fan of UTF-32 (32-bit character)
  at times, but when i looked more deeply in Unicode, it turned
  out to be false thinking: some languages are so complex that you
  need to address entire sentences, or at least encapsulate
  "graphem" boundaries, going for "codepoints" is just wrong.

  Then i thought Microsoft and their UTF-16 decision was not that
  bad, because almost all real life characters of Unicode can
  nonetheless be addressed by a single 16-bit codepoint, and that
  eases programming.  But moreover UTF-8 needs three bytes for
  most of them.

Why did it happen?  Why was the char type overloaded like this?
Why was there no byte or "mem" type?  It is to this day, i think,
that ISO C allows to bypass their (terrible) aliasing rules by
casting to and from char*.

In v5 usr/src/s2/mail.c i see

  +getfield(buf)
  +char buf[];
  +{
  +       int j;
  +       char c;
  +
  +       j = 0;
  +       while((c = buf[j] = getc(iobuf)) >= 0)
  +       if(c==':' || c=='\n') {
  +               buf[j] =0;
  +               return(1);
  +       } else
  +               j++;
  +       return(0);
  +}

so here the EOF was different and char was signed 7-bit it seems.
At that time at latest i have to admit that i have not looked in
old source code for years.  But just had a quick look in the dmr/
of 5th revision, and there you see "char lowbyte", for example.

A nice Sunday from Germany! i wish you, and the list,

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)


More information about the TUHS mailing list