International Unix

Fri Oct 25 05:57:31 AEST 1985

A couple of notes on the message from Erik Fair (ucbvax!fair):

Unfortunately, you CAN'T build a good international character set.
Some of those silly European countries have the same character in
several languages, but sort the character in different places in each
language.  They also have interesting constructs like characters that
sort as two characters, and pairs of characters that sort as single
characters.  That is, there might be a character @ which sorts as "xy",
so that @m sorts right after xylophone and before xyn.  Similarly, they
sometimes say that the pair ll sorts as a single character; I don't
remember where.

Character set is not (or should not be) a very basic assumption.
Aren't there EBCDIC UNIXes out there?  Most of the system is (should be)
completely independent of the character set.  The only place you should
have problems will be programs which make assumptions about arithmetic
on characters, or about the range of values characters take on.
(Note that C promises that all characters are non-negative (this is not to
say that all possible values of a char variable are non-negative, however))
What characters does the kernel (for instance) know and care about?
Slash (/), Null (\0), and maybe Dot (.) in the main body of the kernel;
a few control characters in the tty drivers.  No big deal.

There will be work, but it shouldn't be too bad.

Much more grunt work is involved in isolating the messages for translation.
People writing code commercially should keep this in mind.  Keep your
messages in a separate module, or better yet in an external file.  Try to
make the code flexible about exactly how long messages are; the length will
vary dramatically when you translate the message, and English is usually
the most terse language.

Wouldn't it be easier to convince the Europeans to speak English? :-)