Recovering old UNIX manuals

Warren Toomey wkt at
Mon Jul 13 13:47:55 AEST 1998

	I'm forwarding on Norman's e-mail describing his efforts at
converting his paper-only copies of the early UNIX manuals back into
machine-readable format.


norman at writes:
> The first pass of markup is all done on chapter I of 5e, which is
> all I have scanned so far.  It is tempting to forge ahead on the
> text extracted from Dennis's 1e, but I hope to discipline myself
> to finish some surrounding documentation and tools.  On each front,
> right now there is:
> 	- a small collection of tools to pre-process what comes out
> 	of the OCR into something that is easy to mark up.
> 	Specifically there are a couple of little filters that
> 	fix up the non-ASCII characters emitted by the Mac, and
> 	that glue hyphenated words back together; and a rather
> 	bigger awk script that does some of the easy grunt work
> 	like spotting and marking up entry titles and section headers.
> 	- a description of the markup language (written in itself,
> 	of course).
> 	- a program (also in awk, and surprisingly long) to render
> 	the markup language into approximately V7 -man.  (I have
> 	actually done all the work so far on the MicroVAX in my
> 	basement, which is one of the last remaining V10 systems
> 	in the world, and it won't surprise me to learn that the
> 	renderer has accidentally picked up some V10-specific
> 	assumptions.)
> 	- a collection of advice on style and known OCR botches
> 	and whatnot for those who mark up and proof the manuals
> 	as they go through the pipe.  (At the moment `those' means
> 	me and my collaborator in California.)
> The most important missing tools and writings are something to render
> into HTML, and something that explains a little more generally just
> what it is I am doing (and how it differs from what Dennis did, and
> for that matter from just trying to regenerate the original troff
> input) and describes the tools and so on.  My current hope is to
> get those done in odd moments this week; once I have a decent
> approximation of each, I want to put copies of all the documents
> and all the tools and a few sample pages from 5e up on the web, so
> people have something to look at and I can get comments from a wider
> group.  (Obviously I'll drop a note to the PUPS mailing list when
> things are up there.)
> While I'm writing the HTML renderer and the missing document this
> week, my colleague in California has already begun an independent
> proofreading pass over the stuff I've marked up, which is a damn
> good thing because I can't see the errors any more (and she has
> already spotted some).
> The other tools I know are missing are
> - some sort of structure to allow the old pre-typesetter manuals
> to be rendered in a good approximation of their original form.
> At the moment I expect this will just be a troff macro package
> with the syntax of V7 -man, so I can just use the existing renderer,
> though I can see some font issues looming that may cause force the
> renderer to change (perhaps in a way general enough that there will
> still be only one renderer).
> - something to allow V6-era -man (or /usr/man/man0/naa, to name it
> properly) macros to work too; the obvious cheap way out is something
> that translates V7 -man to V6, presumably with the knowledge that what
> it is translating came out of my markto7man renderer (which restricts
> the language quite a bit, so the job is a lot simpler).  I'm not sure
> how important this is--the obvious short-term goal is to be able to
> have a man command in the V5 environment, and since the macros probably
> aren't in the existing distribution, it's fair game to bring in a copy
> of the V7 ones--but it seems worth having in the long run if only for
> fun.
> I'd originally thought to write more of the tools before doing so
> much markup, but I'm glad I didn't--the markup language mutated more
> than I expected as experience showed where it was wrong, and it made
> life simpler to have only one renderer to update.  I think it is
> pretty much stable now, and in any case I am champing at the bit to
> be able to display things in HTML.
> A final complication in all this: it is all but certain that I'll
> be resigning from York this week, effective in about a month, to
> jump back to a position at the University of Toronto (running
> computers for the Canadian Institute for Theoretical Astrophysics).
> This is not a surprise to anyone concerned (including the folks here
> at York--the real reason for the move is that the eleven-mile commute
> to York is just too long for me), but it will certainly have both
> short- and long-term effects on the time I can spend on the manuals.
> The long-term effects may not be what you think, though: the scanner
> and OCR setup I've been using is located at CITA, so once I've settled
> in there (and especially once I get the tools sorted out well enough
> that it is effectively a pipeline), it should be pretty convenient
> to spend the odd hour scanning in a handful of pages.

Received: (from major at localhost)
	by (8.8.8/8.8.8) id XAA23544
	for pups-liszt; Mon, 13 Jul 1998 23:49:16 +1000 (EST)

More information about the TUHS mailing list