Unix text files
Kay Dekker
kay at warwick.UUCP
Sun Oct 27 02:37:17 AEST 1985
Extensive quoting ensues, as I've moved the discussion to net.unix from
net.bugs, and people may have missed this...
Sometime back, gwyn at brl-tgr.ARPA (Doug Gwyn <gwyn>) wrote:
>> >Many UNIX text-file utilities will discard a (necessarily final)
>> >text line that does not end in a newline. Quite simply, such a
>> >file is not a proper UNIX text file.
and I responded with:
>> Who says? Where's the definition of a 'proper' UNIX text file?
to which he replied:
>The problem is, there are several interpretations of such a file,
>depending on the utility involved. Perhaps there should be a
>well-defined standard interpretation, but there isn't currently.
>
>"A file of text consists simply of a string of characters, with
>lines demarcated by the newline character." -- from "The UNIX
>Time-Sharing System" by Ritchie & Thompson
>
>"text file, ASCII file -- a file, the bytes of which are understood
>to be in ASCII code" -- from "Glossary" in "UNIX Time-Sharing
>System Programmer's Manual", 8th Ed.
>
>"A text stream is an ordered sequence of bytes composed into lines,
>each line consisting of zero or more characters plus a terminating
>new-line character. ... The sequentially last character read in
>from a text stream will, however, always be sequentially the last
>character that was earlier written out to the text stream, if that
>character was a new-line." -- from ANSI X3J11/85-045
>
>My personal choice would be similar to Ritchie & Thompson, where
>newlines delimit (NOT "terminate") text lines, so that the last
>character in a text file would not need to be a newline. However,
>this raises the question of what utilities should do with the
>null line at the end of every text file that DOES end with a
>newline; this will still be utility-dependent (and should be
>documented whenever it is handled differently from other text
>lines in the file).
>
>X3J11/85-045 botched it anyhow, since they intended that ALL UNIX
>files qualify as "text streams" under stdio (vs. "binary streams",
>which have to be handled differently on some non-UNIX OSes).
>
>So, how do we establish a standard interpretation for non-newline-
>terminated UNIX text files?
Doug,
I may be being optimistic (and thus *wrong*) but I don't see where
the problem with your suggestion [newlines delimiting text lines] lies:
the rule would be, simply,
"Text consists of an ordered sequence of characters, with lines delimited
by newline characters. Text is normally terminated by a newline. This
newline should be considered to be followed by a (nonexistant) null line.
The null line should not be considered to be part of the text.
"If the last character of the text is not a newline, then consider
the text to be terminated by a newline - null line pair; however, this
newline - null line pair should not be considered to have been part of
the file.
I *think* that's right...
Kay.
--
"The only good thing that I can find to say about the idea of colonies
in space is that America could, at last, have a world to herself."
-- Elisabeth Zyne
... mcvax!ukc!warwick!flame!kay
More information about the Comp.unix
mailing list