[TUHS] Bell Foreign-Language UNIX Efforts

John Cowan cowan at ccil.org
Tue Mar 21 08:01:23 AEST 2023


On Mon, Mar 20, 2023 at 4:48 PM Steffen Nurpmeso <steffen at sdaoden.eu> wrote:

However note that even something like "uppercase this string"
> cannot be done the right way, because a truly Unicode aware
> operation needs to look at the entire string (sentence), because
> there may be interdependencies that modify the result.


If you are talking about downcasing Greek Σ, then it's true that always
downcasing Σ to σ is inadequate.  Unicode specifies that if the Σ appears
before a space or punctuation mark, it downcases to ς instead.  But this is
not always correct.

For example, if the string "ΦΙΛΟΣ." is the word "φιλοσ" (meaning 'beloved'
or 'friend') at the end of a sentence, "φιλοσ." is the correct downcasing.
But if it is the abbreviation for "φιλοσοφία", meaning "philosophy", then
the correct downcasing is "φιλοσ."  So getting this right is an AI-complete
problem which neither Unicode nor ICU can solve.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20230320/821631d0/attachment.htm>


More information about the TUHS mailing list