Unix text files

Danny Zerkel danny at nvzg2.UUCP
Wed Oct 30 01:11:43 AEST 1985


> > "Text consists of an ordered sequence of characters, with lines delimited
> > by newline characters.  Text is normally terminated by a newline.  This
> > newline should be considered to be followed by a (nonexistant) null line.
> > The null line should not be considered to be part of the text.
> > 	"If the last character of the text is not a newline, then consider
> > the text to be terminated by a newline - null line pair; however, this
> > newline - null line pair should not be considered to have been part of
> > the file.
> > 
> > I *think* that's right...
> > 							Kay.
> 
> Perhaps that is the best interpretation, but it sure is hard
> to put all that into a formal grammar, whereas the original
> concept was very simple:
> 
> file		::=	binary_file	|	text_file
> 
> binary_file	::=	{ byte }*
> 
> byte		::=	<primitive unit of data,
> 				at least 8 bits>
> 
> text_file	::=	{ text_line }*
> 
> text_line	::=	{ text_char }* newline
> 
> text_char	::=	<7-bit ASCII character
> 				excluding NUL and newline>
> 
> newline		::=	<ASCII LF character>

Hmmm... sounds like the old, variable length data representation problem...
Hmmm.....  seems to me there are two fundamental representations of
variable length data, counting and sentinels.  Anything not fitting these
molds is unstructured or at best partially structured.

Looks like a text file is being represented as a variable number of
variable length strings. Except that the number of lines is unknown
(but indirectly derivable), and the sentinel marking the end of a line
is optional on the last line.

  "Look, ma, unstructured data!"
  "Avert your eyes son, or it will blind and confuse you."

Does anyone out there want to show those of us with weak knees how one
would use this kind of data structure [used loosely] in a program?
(In other words, as if the data were within the program not without.)
Without additional support information, like keeping track of the number
and lengths of lines.

I think it would be a good example to the young of inheirent complexity.
And I thought we were trying to make life simple!  The main problem here
is that we are trying to impose structure on unstructured data, which
is probably not the best approach.

NOTE:
Sentinels are a wonderful way of implementing lists, but a terrible way
of implementing strings.  Hint, hint.

All of this is not to say, the is no use for unstructured data.  "tr" does
a great job on unstructured data, mainly because it treats it as such.
Using "cat" to look at files, however is probably the worst offender.  It
does not care what the data is, but attempts to make it appear on the
users screen.

------------------------------------------------------------------------
>From the finger tips spasmotically responding to the brain of the Master
of the Universe between the ears of---Danny J. Zerkel



More information about the Comp.unix mailing list