[TUHS] Bell Foreign-Language UNIX Efforts

Steffen Nurpmeso steffen at sdaoden.eu
Tue Mar 21 08:28:19 AEST 2023


John Cowan wrote in
 <CAD2gp_TgTFL5agm8Z=immnGiMkpELL-wM_ZXos8OcKngw=2DLw at mail.gmail.com>:
 |On Mon, Mar 20, 2023 at 4:48 PM Steffen Nurpmeso <steffen at sdaoden.eu> \
 |wrote:
 |
 |However note that even something like "uppercase this string"
 |> cannot be done the right way, because a truly Unicode aware
 |> operation needs to look at the entire string (sentence), because
 |> there may be interdependencies that modify the result.
 |
 |If you are talking about downcasing Greek Σ, then it's true that always
 |downcasing Σ to σ is inadequate.  Unicode specifies that if the Σ appears
 |before a space or punctuation mark, it downcases to ς instead.  But this is
 |not always correct.
 |
 |For example, if the string "ΦΙΛΟΣ." is the word "φιλοσ" (meaning 'beloved'
 |or 'friend') at the end of a sentence, "φιλοσ." is the correct downcasing.
 |But if it is the abbreviation for "φιλοσοφία", meaning "philosophy", then
 |the correct downcasing is "φιλοσ."  So getting this right is an AI-complete
 |problem which neither Unicode nor ICU can solve.

Oh, i'd wish i only would be able to speek/read/write (old) Greek.
Unfortunately, after English, i either had to go to another school
or choose in between French and Latin, (i would have given
everything for Chinese, Japanese, and/or Russian), so i had chosen
Latin.  And whereas i started out as one of the three best, i then
watched an Interview with a CDU ("republican") state secretary,
with the wonderful Lea Rosh, and he talked Latin; and
whereas she repeatedly said "i understand you, but what is with
the audience?", you know, i as a young teenager, i was _so_ pissed
that "i quit", as like in the book "The Tin Drum" of Günter Grass.
So this made my grade point average a bit weaker.

But yes, i think quite a lot of languages have this problem.  Even
my own native language German for the conversion of the lowercase
sharp-s, even though for over hundred years some try to establish
an uppercase variant, which the Swiss tongue has.  (Mind you, even
after WWII when that uppercase ss was forbidden, at least in some
dosage forms, like that one used by the US rock band Kiss, ..not.)

If you would ask on the Unicode mailing-list, you will be told to
only convert entire sentences.  But it seems Greek sigma is very
special, says Unicode FAQ.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)


More information about the TUHS mailing list