Hi Rob,
The output of "unicode 5d0-5e7"
(robpike.io/cmd/unicode has the
command) is fun.
05d0 א 05d1 ב 05d2 ג 05d3 ד
05d4 ה 05d5 ו 05d6 ז 05d7 ח
05d8 ט 05d9 י 05da ך 05db כ
05dc ל 05dd ם 05de מ 05df ן
05e0 נ 05e1 ס 05e2 ע 05e3 ף
05e4 פ 05e5 ץ 05e6 צ 05e7 ק
For comparison, here is "unicode 3d0-3e7". It will be fun to watch how
it's rendered.
03d0 ϐ 03d1 ϑ 03d2 ϒ 03d3 ϓ
03d4 ϔ 03d5 ϕ 03d6 ϖ 03d7 ϗ
03d8 Ϙ 03d9 ϙ 03da Ϛ 03db ϛ
03dc Ϝ 03dd ϝ 03de Ϟ 03df ϟ
03e0 Ϡ 03e1 ϡ 03e2 Ϣ 03e3 ϣ
03e4 Ϥ 03e5 ϥ 03e6 Ϧ 03e7 ϧ
In the terminal where I read and write email, they're all as if ‘0041 A’.
But save the email's text/plain to foo.txt and foo.html, add a little HTML
to foo.html, and the browser, here Firefox, presents the Hebrew in both as
05d0 05 אd1 05 בd2 05 גd3 ד
05d4 05 הd5 05 וd6 05 זd7 ח
05d8 05 טd9 05 יda 05 ךdb כ
05dc 05 לdd 05 םde 05 מdf ן
05e0 05 נe1 05 סe2 05 עe3 ף
05e4 05 פe5 05 ץe6 05 צe7 ק
due to the mix of Unicode's strong, weak, and neutral bi-directional
character types.
To see what I intend above needs a ‘broken’ renderer, like a terminal.
For those with more intelligent renderers, it's as if runes normally
drawn as
00c0 À 00c1 Á 00c2 Â 00c3 Ã
became
00c0 00 Àc1 00 Ác2 00 Âc3 Ã
Wrapping each of the Hebrew characters in the text and HTML files in
LRI...PDI,
LRI U+2066 Left-to-right isolate
PDI U+2069 Pop directional isolate
so the first row becomes
0030 0035 0064 0030 0020 2066 05d0 2069 0020
0030 0035 0064 0031 0020 2066 05d1 2069 0020
0030 0035 0064 0032 0020 2066 05d2 2069 0020
0030 0035 0064 0033 0020 2066 05d3 2069 000a
has Firefox display the tables as intended. Perhaps the unicode command
should do this to ensure correct display, especially if some terminals
ever start to improve?
I note that vim(1) here doesn't realise LRI and PDI are zero width
so the cursor position drifts past the end of the visible line.
ed(1) copes without a murmur.
--
Cheers, Ralph.