[TUHS] Character sets

Greg 'groggy' Lehey grog at lemis.com
Mon Mar 28 07:59:47 AEST 2016

On Sunday, 27 March 2016 at 23:53:32 +0200, Johnny Billquist wrote:
> On 2016-03-27 23:49, Greg 'groggy' Lehey wrote:
>> On Sunday, 27 March 2016 at 13:47:43 +0200, Johnny Billquist wrote:
>>> On 2016-03-27 13:29, John Cowan wrote:
>>>> Johnny Billquist scripsit:
>>>>> On 2016-03-27 08:18, Greg 'groggy' Lehey<grog at lemis.com> wrote:
>>>>>> Isn't it wonderful that we no longer have issues with character
>>>>>> representation?
>>>>> I hope that comment was meant as a joke, ironic, cynical, or whatever...
>>>> Undoubtedly.  But things *are* much better than they used to be:
>>>> we can now do everything within a single character set, and convert
>>>> only at the boundaries (and increasingly, only in one direction).
>>> Haha. Yes... Except that you now have multiple representations of each
>>> character within one character set. So what has improved???
>> In the Good Old Days, characters were all the same size, and you could
>> do nice, simple things like
>>    while (*c && *c++ != " ");
>> Now you need a whole library to do the same thing.
> Another one I noted a while ago was that functions and command in Unix,
> such as lpq, which try to print things in nice columns now fail, because
> the code don't actually know how many characters have been output.
> And let's not even talk about such wonderful concepts as colors in the
> character set definition... Unicode seems to have it all... I wonder how
> many code points exist for 'A'. It's definitely more than one...

For some definition of A, of course.  In addition there's clearly at
least Α (0x391) and А (0x410).

