[TUHS] Canonical Historic Character Encoding Conversion?
segaloco via TUHS
tuhs at tuhs.org
Thu Nov 13 08:19:00 AEST 2025
So I was working up a draft procedure for identifying strings in
NES and other tile-based video game titles, and was intending
for iconv(1) to be my central encoding converter. Unfortunately,
I was today years old when I noticed:
> The implementation need not support the use of charmap files
> for codeset conversion unless the POSIX2_LOCALEDEF symbol is
> defined on the system.
However the rationale section of the page then states:
> The iconv utility can be used portably only when the user
> provides two charmap files as option-arguments.
Don't these two statements contradict one another, one stating
that a feature is not required, and the other stating that a
feature is required for portability? Isn't the whole point of
POSIX portability?
In any case, what this has me wondering is if there was any,
older, more guaranteed method for arbitrary encoding conversions?
There is the well-known ASCII-EBCDIC conversion in dd(1), but
this mechanism does not seem to be extensible to other, arbitrary
character encodings.
Is there some historic mechanism I'm glossing over? In the end I
can go with another approach, but I want to use the canonical
UNIX way if at all possible, I don't want the same shock I had
today after carefully drafting several charmaps for a project
only to find that the POSIX standard had no teeth in that area
and pretty much guarantees nothing.
- Matt G.
More information about the TUHS
mailing list