[TUHS] Bell Foreign-Language UNIX Efforts

Thu Mar 23 08:33:07 AEST 2023

Rob Pike wrote in
 <CAKzdPgwYPxK9oYemG5-vPgRR7mSfj_qkjD5-iJnLffP-23PUaQ at mail.gmail.com>:
 |The appendix version named it plain UTF, repurposing the extant name to the
 |new encoding. The -8 came later, as it is in these linked documents,
 |because some people wanted a UTF-7 and a UTF-16. Those people should be
 |punished.

I agree, but please with a but.

For one especially so since UTF-7 (that i like) then didn't make
it all through, but only here and there.
Ie, if it would have been used for anything mail and DNS related
to keep 7-bit compat.  Instead they introduced monstrosities like
IDNA for DNS, mUTF-7 (locale charset -> UTF-16BE -> mUTF-7) etc.

That i hated: IDNA.  If they would have said we give up on
backward compatibility around Y2K, and the old stuff grows out;
and 255 bytes UTF-8 is surely enough for domain names for some
time (even percent encoded) even for those encodings which need
four byte for one codepoint, and it simply does not work before.
Like so they introduced those backward incompatibilities that they
wanted to avoid.

I did oppose strongly in the past, but UTF-16 has merits for some
languages as well as for coding, even though you have to be able
to deal with surrogates, .. and with grapheme boundaries, if you
are doing it right, so 1:many is there anyhow.  I mean, wchar_t is
often 32-bit, and then not even UTF-32, at least possibly.  But
still you have the 1:many, so it buys you nothing.
All-UTF-8 is of course great imho.  (Asian people may disagree.)

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)