[TUHS] Character sets

Mon Mar 28 11:54:59 AEST 2016

Johnny Billquist scripsit:

> While true, I do not agree that Unicode is complicated because of
> writing systems. Unicode have surpassed the writing systems...

Yes, there is also incidental complexity required by the need for
various pre-existing factors.

> Yeah, you just need to suck in a few gigabytes of Unicode libraries
> in your 4K program. I'm not sure I agree that this is an acceptable
> solution.

I doubt if the program is really just 4K any more, and there are such
things as shared libraries.  The Asian width table is not very big
by itself, especially if you use runs of characters rather than individual
characters and do a binary search.

> Really. So how should Green Book (U+1F4D7) be rendered differently
> than Blue Book (U+1F4D8), or Orange Book (U+1F4D9) ?

See <http://unicode.org/emoji/charts/full-emoji-list.html> (slow to load)
and examine the fourth column ("Chart") for rows 1063-65.  Basically,
GREEN BOOK has vertical stripes on the cover, BLUE BOOK has horizontal
stripes, and ORANGE BOOK is black with white dots.

> And what are your thoughts on FULLWIDTH LATIN CAPITAL LETTER A
> (U+FF21). What is the semantic difference in having more whitespace
> around the letter? 

1-1 convertibility with various Japanese character sets.  Unicode is
not Cleanicode: it was designed not to do the best possible job, but
the best job possible under the circumstances.

-- 
John Cowan          http://www.ccil.org/~cowan        cowan at ccil.org
    "Any legal document draws most of its meaning from context.  A telegram
    that says 'SELL HUNDRED THOUSAND SHARES IBM SHORT' (only 190 bits in
    5-bit Baudot code plus appropriate headers) is as good a legal document
    as any, even sans digital signature." --me