[TUHS] Yacc binary on 4th edition tape

Mon Jan 12 08:21:39 AEST 2026

Jonathan Gray very kindly found two sources for “yopti” in the TUHS archives.

The first is in the UNSW 110 archive (https://www.tuhs.org/Archive/Distributions/UNSW/110/). The archive is from 1981, but it appears to be the 1975 yacc of 6th edition.

This yacc has the source for the optimizer ("yopti.c”). It reads the the y.tab.c file that plain 6th edition yacc generates, does some processing and writes out a new y.tab.c. It also comes with a new yyparse routine and this routine is essentially identical to the yyparse of 7th edition (which I think is more or less the final version of classic yacc).

The other is in the PWB1 distribution (https://www.tuhs.org/cgi-bin/utree.pl?file=PWB1/sys/source/s2/yacc.d), from 1977. The optimizer is now in the source file "y5.c” and largely, but not fully the same. It has a new way to compress the “yyact” table. The y5.c file can build both as a separate program and as a subroutine of the yacc core, reusing table space from the earlier phases. The new way to compress is also used in 7th edition, but the optimizer has been fully integrated in the core yacc codebase.

I’ve also disassembled the first part (y1.c) of the yacc binary in 4th edition. There is more instrumentation (regular calls to “times()” to track time spent) and there is no code to call out to an optimizer pass, suggesting that it might not have existed yet (but of course absence of proof is not proof of absence).

The development of table sizes is interesting, in particular “yyact”; I have tested with the 1978 grammar for the Eh compiler.

- The 1975 core yacc generates a yyact table with about 1650 entries. Here the yyparse routine has to do a lot of searching through the tables whilst parsing: the representation is compact, but slow to process.

- The 1975 optimizer changes the table structure to avoid the searching (arriving at about the final form of “yaccpar”), but the yyact table that it produces is huge: the 1600 entries are expanded into almost 11,000 entries, many of which are “0”. Probably it was considered work-in-progress at this stage and hence not included on the 6th edition tape. (I had to fix one bug in yopti.c to get it to run; maybe this was a bad fix but at the moment I don’t think so.)

- The algorithm in the 1977 PWB1 version is a spectacular improvement: the “yyact” table goes down in size to about 650 entries. Note that “yaccpar” does not change: it is ‘only' a superior algorithm to find the optimal compression of a sparse table.

All in all, my hypothesis on the timeline would now be:

- 1971: first version of yacc created in B
- 1972: improvements to make it more practical
- 1973: yacc introduced to Waterloo (relevant for Eh)
- 1973: conversion from B to C
- 1974: improvements on speeding up table generation
- 1975: improvements on speeding up yyparse execution
- 1976: improvement on reducing optimized table size
- 1977/78: cleaning up code base, portability, tracking C changes
- 1979: more or less final version of classic yacc

The above matches with interviews with Johnson and Aho, where both say that yacc was improved over a number of years and with about a dozen rewrites in that period.

====

Now back to the timeline for the Waterloo Eh compiler.

- I think it is a fact that the early B compiler on the Honeywell 6000 was a two-pass compiler, three if you count the assembler. Steve Johnsons writes "The './bj' command works as follows: the first two passes, ./b1 and ./b2, of the B compiler are run in MH-TSS. The result of the second pass of the compiler is a GMAP program, which must be sent to the batch world, compiled, and loaded with the B I/O library (on file ./blib) to create an executable H* file. The ./bj command calls ./b1 and ./b2, and if no errors are detected, the GMAP deck is submitted to the batch world; currently, this is done using 'jrun’.” (https://www.nokia.com/bell-labs/about/dennis-m-ritchie/bref.html). I don’t know much about the H6000, but apparently GMAP is the system macro assembler. Not a fact, but I assume that the “b1” pass is Ken Thompson’s B front end (lexing, parsing, threaded/intermediate code generation) and the “b2” pass is Dennis Ritchie’s "tour de force" native code generator.

- In November 1975 the Thoth project writes that they like the B compiler, and “this is largely due to the fact that it is a syntax directed compiler for a language which has a simple and compact syntax. The one-pass compiler is implemented in B.” This sounds like a different compiler. The B compiler above is clearly not one pass and only syntax directed in the sense that any compiler is syntax directed. This compiler remains a bit of a mystery. Maybe it is something that Steve Johnson wrote during his sabbatical there.

- However, in February 1976 they write (CS-76-11): "As the first stage of the project, we have designed a high-level implementation language (called Eh) which will be common to all machines. […] The Eh compiler consists f two phases: the first phase does the syntax analysis and outputs intermediate language. The intermediate language resembles the order code for a stack oriented machine. During the second phase of compilation, the intermediate language is translated to the target machine language.”. This sounds like reverting to the structure of the original Honeywell B compiler, with the difference that it uses yacc in its front-end and the back-end outputs relocatable object code, not assembler source.

[as an aside: the 1978 version of this Eh intermediate code is well documented and it has some intriguing similarities to the PDP-7 threaded code for B as reconstructed by AAP; it needs more investigation, but it suggests that the Eh intermediate code was influenced by the intermediate code between the b1 and b2 passes of the original H6000 B compiler.]

The question now is what version of yacc they would have used for the Eh compiler. It may be the yacc that Steve Johnson brought to Waterloo in 1973, but in view of the above timeline, it was probably a bit unwieldy to use at that point in time. Another possibility is that they used yacc from Unix 6th edition (the Waterloo Math department had a PDP-11 running Unix), and back-ported that later to Eh later to make their tool chain self hosting. With V6 being dated around May 1975, it would stand to reason that they had V6 running by the end of that year. From what I understand so far, it is unlikely to have used the optimizer, as that appears to have been unpolished / unfinished until a bit later in time.