Unix text files

Kay Dekker kay at warwick.UUCP
Sun Oct 27 02:37:17 AEST 1985


Extensive quoting ensues, as I've moved the discussion to net.unix from
net.bugs, and people may have missed this...

Sometime back, gwyn at brl-tgr.ARPA (Doug Gwyn <gwyn>) wrote:

>> >Many UNIX text-file utilities will discard a (necessarily final)
>> >text line that does not end in a newline.  Quite simply, such a
>> >file is not a proper UNIX text file.

and I responded with:

>> Who says?  Where's the definition of a 'proper' UNIX text file?

to which he replied:

>The problem is, there are several interpretations of such a file,
>depending on the utility involved.  Perhaps there should be a
>well-defined standard interpretation, but there isn't currently.
>
>"A file of text consists simply of a string of characters, with
>lines demarcated by the newline character."  -- from "The UNIX
>Time-Sharing System" by Ritchie & Thompson
>
>"text file, ASCII file -- a file, the bytes of which are understood
>to be in ASCII code"  -- from "Glossary" in "UNIX Time-Sharing
>System Programmer's Manual", 8th Ed.
>
>"A text stream is an ordered sequence of bytes composed into lines,
>each line consisting of zero or more characters plus a terminating
>new-line character.  ...  The sequentially last character read in
>from a text stream will, however, always be sequentially the last
>character that was earlier written out to the text stream, if that
>character was a new-line."  -- from ANSI X3J11/85-045
>
>My personal choice would be similar to Ritchie & Thompson, where
>newlines delimit (NOT "terminate") text lines, so that the last
>character in a text file would not need to be a newline.  However,
>this raises the question of what utilities should do with the
>null line at the end of every text file that DOES end with a
>newline; this will still be utility-dependent (and should be
>documented whenever it is handled differently from other text
>lines in the file).
>
>X3J11/85-045 botched it anyhow, since they intended that ALL UNIX
>files qualify as "text streams" under stdio (vs. "binary streams",
>which have to be handled differently on some non-UNIX OSes).
>
>So, how do we establish a standard interpretation for non-newline-
>terminated UNIX text files?

Doug,
	I may be being optimistic (and thus *wrong*) but I don't see where
the problem with your suggestion [newlines delimiting text lines] lies:
the rule would be, simply,

"Text consists of an ordered sequence of characters, with lines delimited
by newline characters.  Text is normally terminated by a newline.  This
newline should be considered to be followed by a (nonexistant) null line.
The null line should not be considered to be part of the text.
	"If the last character of the text is not a newline, then consider
the text to be terminated by a newline - null line pair; however, this
newline - null line pair should not be considered to have been part of
the file.

I *think* that's right...
							Kay.
-- 
"The only good thing that I can find to say about the idea of colonies
in space is that America could, at last, have a world to herself."
						-- Elisabeth Zyne
			... mcvax!ukc!warwick!flame!kay



More information about the Comp.unix mailing list