[TUHS] Retypesetting Unix documents (was: Documents for UNIX Collections)

segaloco via TUHS tuhs at tuhs.org
Sun Jul 10 01:57:50 AEST 2022

Thanks for all of the background Branden.  I would be using Groff so this is all very valuable information, I'm glad someone with expertise is watching!

I have already noticed the differences in width, but haven't looked to far into it yet.  The effort would likely involve two phases: one to apply changes to the 3.0 docs to match the content of 4.0, then once the content is updated, the remaining matter is formatting consistency (as much as possible) with the typeset documents we have scans of.

If it ain't perfect, that's alright.  The implication would be that the original 3.0 documents on the Autologic system would be formatted similarly, so the edits would produce *ROFF input that, back then, would spit these documents out.  Perhaps identical typesetting would be a bit ambitious, but failing that, content modification shouldn't be as difficult.  Plus, processing the 3.0 docs today with Groff would produce documents with the same alignment, so for comparison purposes it'd all still be very useful.

Thanks again for the reply, I'll have to reach out when I'm more regularly working on this again, work and summer have been my primary time sinks lately :)

- Matt G.

------- Original Message -------
On Friday, July 8th, 2022 at 5:40 PM, G. Branden Robinson <g.branden.robinson at gmail.com> wrote:

> [not CCing Matt because his address didn't come through to the list]
> Hi Matt,
> At 2022-07-09T08:58:10+1000, Warren Toomey via TUHS wrote:
> > ----- Forwarded message from Matt Gilmore -----
> >
> > Subject: Documents for UNIX Collections
> >
> > Good afternoon everyone, my name is Matt Gilmore, and I recently
> > worked with some folks here to help facilitate the scanning and
> > release of the "Documents for UNIX" package as well as a few odds and
> > ends pertinent to UNIX/TS 4.0. I've been researching pretty heavily
> > the history of published memoranda and how they ultimately became the
> > formal documents that Western Electric first published with UNIX/TS
> > 5.0 and System V. Think the User's Guide, Graphics Guide, etc.
> That's excellent work--thank you for doing it!
> > One of the projects I'm working on (slowly) is comparing these
> > documents with the 4.0 docs I scanned for Arnold and making edits to
> > the *ROFF sources with the hopes I could then use them to produce 1:1
> > clean copies of the 4.0 docs, while providing an easy means for
> > diff'ing the documents as well (to flush out changes between 3.0 and
> > 4.0).
> Are you using groff to do your rendering? If so, please consider me a
> resource; I've been the most active groff developer for the past 4
> years. (I am, however, not the release manager--we're feeling heavily
> pregnant with groff 1.23, 3.5 years in the making.)
> Some of the following issues may be familiar to you; I apologize if I
> wear a rut in well-trodden ground here.
> I am wondering what you mean by "1:1 clean copies". I embarked on a
> similar exercise only about a week ago with the Kernighan & Cherry
> document "Typesetting Mathematics -- User's Guide (Second Edition)",
> which was part of Volume 2 of the V7 Unix Programmer's Manual.
> In the course of that effort I learned several things. I identified
> (and fixed) bugs in groff's ms(7) implementation, and to my surprise
> also discovered one in, apparently, V7 troff that caused an equation at
> the bottom of a column to go missing. Because groff was independently
> developed, the equation sprung back to life in its rendering. You can
> find a narrative of my experiences at the following thread, along with
> commentary from others.
> https://lists.gnu.org/archive/html/groff/2022-07/msg00000.html
> Pixel-perfect matching of C/A/T (or APS-5, etc.) output will be
> impossible because the fonts are different. More than that, the font
> metrics are different, which means lines will not always fill the same
> when comparing historical typesetter output and a modern
> implementation's (this will be true even if you use Heirloom Doctools
> Troff, which is descended from V7 Unix, but has seen many changes over
> the years, starting with Kernighan's revision for device-independence
> ca. 1980, plus many changes for the commercial Documenter's Workbench
> product, and then many more by Gunnar Ritter and his successors in the
> Heirloom project).
> Beyond that, Unix troff and groff use different hyphenation systems. I
> don't know how stable Unix troff's was over time.
> All of that said, with the Kernighan and Cherry document, by spending
> just a few minutes eyeballing old scans and groff PostScript output,
> flicking between two fullscreen viewers like an ersatz blink comparator,
> and using binary search to tweak the ms(7) LL, PO, and MINGW registers,
> I was able to almost perfectly match column and page breaks between
> the two renderings, which was a higher fidelity of reproduction than I
> expected. The risen equation noted above was the most dramatic change.
> Encouraged by that experience, I also reset the V7 Unix version of the
> article "A System for Typesetting Mathematics". This apparently was
> not published in the Programmer's Manual, possibly because much of its
> content was duplicated in the user's guide. But the amount of effort
> required of me was shockingly low. On the other hand, for this I didn't
> have an authentically typeset copy to compare to, so all I did was look
> for what I would consider rendering errors as opposed to cosmetic
> changes. (Maybe this the standard you want to apply in your own work?)
> I'm attaching a diff.
> Another apparent difference arises between V7 Unix eqn and groff eqn; in
> eqn input such as "lim from {x-> pi /2} ( tan~x) sup{sin~2x}~=~1", V7
> eqn will recognize "->" as beginning a new token and convert it to a
> right arrow glyph in the output, despite the manual (as I understand it)
> implying that it won't. groff eqn does require token separation in
> this case.
> I say that differences are "apparent", rather than making the stronger
> claim of outright bugs in V7 Unix tools mainly because I don't have a
> cat2dit(1) tool I can run in my V7 Unix environment in SIMH. In my
> opinion such a tool (in K&R C, of course) would be well worth having.
> Right now, to satisfy myself of V7 Unix troff behavior I have to produce
> an octal dump of the typesetter output, pull it out of the emulation
> environment with copy-and-paste, undump it with a custom program (xxd is
> not helpful), and then give the reconstructed C/A/T stream to an
> interpreter written by John Garder in JavaScript. John's tool (and his
> personal assistance) has proven invaluable, but it's a component of a
> larger project of his that renders device-independent troff output in a
> Web browser window. For this to be practical he has to introduce
> additional device-independent troff commands into the output. I'd
> prefer something more rabidly puritan (and, if I'm honest, something
> written in a more traditional Unix system programming language).
> https://github.com/Alhadis/Roff.js/
> The big advantage of a V7 Unix/PDP-11 cat2dit(1) would be that
> device-independent troff output is plain text and much easier to spirit
> out of the emulated environment to the host system. Also, some people,
> who may be pitied, have taught themselves to read it, making more
> observations possible and hypotheses testable within the PDP-11
> environment. (In principle, this is also true of C/A/T command streams,
> whether raw or octal-encoded, but I'll just let the pity roll downhill.)
> Thanks largely to Henry Spencer, the information to write a new
> cat2dit(1) from scratch is available. Eventually, if no one else does
> so, I will undertake it myself; but my queue is deep (mostly with groff
> defect reports and feature requests).
> https://github.com/Alhadis/otroff/blob/92683053f9aad5b926fc447843bf2092ad59cebf/cat.5
> Dan Plassche pointed me toward Adobe Transcript, but my understanding is
> that it falls short of my needs in 3 ways: it produces PostScript, which
> I can't easily read, not device-independent troff output (which I can);
> it's not available in a version ready to run in a modern Unix
> environment; and it has a licensing encumbrance. I'd like a cat2dit(1)
> we can all trade around libre and gratis.
> Alternatively, if someone leaked the troff sources from UNIX/TS 4.0,
> that would bring a grin of Jack Nicholsonian proportions to my face.
> That should be buildable in vivo on a PDP-11 and would facilitate much
> other historical research besides. (With it, someone could annotate a
> diff of the troff/nroff source trees between V7 and UNIX/TS 4.0, which I
> wager constitutes a highly positive and teachable moment in software
> design and engineering.)
> Okay, brain dump terminated. Please let me know if I can help.
> Regards,
> Branden

More information about the TUHS mailing list