[TUHS] non-Posix input files
Douglas McIlroy
douglas.mcilroy at dartmouth.edu
Fri Mar 28 05:50:50 AEST 2025
It seems my reply to Clem went astray. POSIX Part 2, Shell and
Utilities, is very clear:
2..2.2.181 text file: A File that contains characters organized into
one or more lines.
The lines shall not contain NUL characters ...
2.2.2.95 line: A sequence of zero or more non-<newline> characters
plus a terminating <newline> character,
Oddly--and in my opinion wrongly--the standard excludes empty files.
It would be a shock
if an editor or groff refused to process an empty file and thereby
broke Kernighan's law, "Do
nothing gracefully".
Doug
On Thu, Mar 27, 2025 at 3:35 PM Clem Cole <clemc at ccc.com> wrote:
>
> Chet - as I said, we tried so hard to keep that kind of crap out. Dennis was right. FWIW: with UNIX (POSIX), input will end if an EOF and the ANSI C untilies will obey it or a newline, so you can write the code to work fine either way. But that's a choice of implementation/what subroutines - how you think about the data.
>
> I'll accept that that is what the words say in the >>awk<< specification document, but as one of the original authors of the first UNIX standard and the later POSIX standard I can say we tried hard to mak sure we got it right and follow the idea: A regular file has no structure and never to allow the standard to impose it. I think the core standard still says that, and the basic idea is unchanged. The actual structure of the input file is an application idea, not a UNIX/POSIX defined idea.
>
> The issue here is the term POSIX. Do you mean it to be the kernel (.1) and if a >>specific<< application with .2 (the C compiler itself, awk, ed) which might put structure onto the file and that's fine. The >>OS<< does not set the structure — it is done by something else.
>
> I understand having the application do it; I wish it did not. Many applications (even text editors) can (and have) been written without needing one specific structure, which is my point. I also accept that the folks who took over the standard in the name of "progress" changed (relaxed) much of what we worked so hard to avoid, knowing there were dragons - particularly WRT to textual information. We really did not want to repeat the errors of the 1960s. i.e., as George Santayana originally wrote, “Those who cannot remember the past are condemned to repeat it.”
> ᐧ
>
> On Thu, Mar 27, 2025 at 3:05 PM Chet Ramey <chet.ramey at case.edu> wrote:
>>
>> On 3/27/25 3:00 PM, Clem Cole wrote:
>> > Argh -- I standard corrected. We worked hard at the beginning to keep that
>> > crap out -- sigh.
>> >
>> > But at least is does says: POSIX.1-2024 /_does not _//_distinguish between
>> > text files and binary files_/ (see the ISO C standard)
>>
>> It also says "The standard utilities that have such restrictions always
>> specify "text files" in their STDIN or INPUT FILES sections," so you can't
>> avoid it.
>>
>> awk is one such utility (sh is not). This is an application requirement, so
>> awk is required to add a newline at the end of a file that does not have
>> one.
>>
>> --
>> ``The lyf so short, the craft so long to lerne.'' - Chaucer
>> ``Ars longa, vita brevis'' - Hippocrates
>> Chet Ramey, UTech, CWRU chet at case.edu http://tiswww.cwru.edu/~chet/
More information about the TUHS
mailing list