[TUHS] Character sets

Random832 random832 at fastmail.com
Mon Mar 28 11:20:32 AEST 2016


On Sun, Mar 27, 2016, at 19:30, John Cowan wrote:
> > >   while (*c && *c++ != " ");
> 
> That particular piece of code still works if the encoding is UTF-8.

Sure it does, but replace that != " " with !isblank(*c), and it doesn't
work anymore since it ignores multibyte characters. Often you don't
care, but you've got to remember to set LC_ALL=C when running grep etc
on large data sets or it will be much slower, since \w and \s care about
multibyte characters (as does case-insensitive matching, etc).



More information about the TUHS mailing list