[TUHS] Yacc binary on 4th edition tape
Paul Ruizendaal via TUHS
tuhs at tuhs.org
Wed Jan 7 19:14:05 AEST 2026
Thank you for these clarifications.
Refetching the v4 archive indeed brings a complete yacc binary, and it is unstripped. I guess at some point I need to find a tool that can disassemble a PDP-11 a.out file, taking into account the symbols.
Indeed, the tape is from 1974, not 1973 — which makes quite a difference as things were moving fast in that era — but still closer to the B version than the yacc sources in v6. Thank you for highlighting this.
Paul
> On 6 Jan 2026, at 17:31, Thalia Archibald <thalia at archibald.dev> wrote:
>
> Hi Paul,
>
> Excellent history on yacc and Waterloo! I’ll be adding your sources to my early
> UNIX history project :).
>
> I was aware of Wes Graham’s work on WATFOR at Waterloo and that the Computer
> Systems Group acquired UNIX in 1976 (which wasn’t necessarily the first group at
> Waterloo to do so), but I didn’t know of anything earlier. I found some
> interesting UNIX documents in the University of Waterloo archives and have
> uploaded them.
> https://archive.org/search?query=subject%3A%22University+of+Waterloo+Archives%22
>
> Also, there are some items in the University of Waterloo Computer Museum that
> I’m hoping to get more information on, once I follow up with the curator.
> https://github.com/thaliaarchi/unix-history/blob/main/users/waterloo/butterworth.md
>
>> All this makes the Yacc binary on the 4th edition tape interesting to me, as it
>> gives a window on the state of Yacc late in 1973 when Johnson returned to Bell
>> Labs. The binary appears truncated at the 16kb mark
>
>
> If you’re using Angelo’s tar, you should fetch it again. He’s fixed some bugs
> that truncated several files, including yacc.
> http://squoze.net/UNIX/v4/unix_v4.tar
>
> Also you should know the tape dates to 12 June 1974, but the manual received was
> V4, so it’s V5 minus a week or so (though I only know that the V5 manual dates
> to the month of June). Although you could consider it a near-clean V5, I still
> call it V4, since the system was versioned by its manual.
>
> Thalia
>
>> On Jan 6, 2026, at 04:48, Paul Ruizendaal via TUHS <tuhs at tuhs.org> wrote:
>>
>>
>> Perusing the 4th edition archive I noticed that the usr/bin directory has a binary for Yacc. This reminded me of a project on my to-do list: recreating the yacc used at the Uni of Waterloo for their Thoth project. Unfortunately, there was no source for Yacc on the 4th edition tape. The oldest version I am aware of is the source as included with 6th edition. However, this looks quite promising.
>>
>> I offer the below timeline analysis for some sanity checking by the people who were there and have some specific questions at the end.
>>
>> For background: my interest is driven by an underlying interest in the “Eh” and “Zed” languages that evolved from B at the Uni of Waterloo. DMR mentions these languages in his paper on the history of C (https://www.nokia.com/bell-labs/about/dennis-m-ritchie/chist.pdf).
>>
>> First on the timeline:
>>
>> - By 1970 there were B compilers written in B for the PDP-7, the GE600 / Honeywell 6000 and for the PDP-11. The GE compiler generated machine code, not threaded code (DMR writes "The most ambitious enterprise I undertook was a genuine cross-compiler that
>> translated B to GE-635 machine instructions, not threaded code. It was a small tour de force: a full B compiler, written in its own language and generating code for a 36-bit mainframe, that ran on [the PDP-7, ] an 18-bit machine with 4K words of user address space.”). As the compiler was written in B, I assume this means that it next also ran on the GE itself. This compiler seems to have been the basis for the nascent C compilers (AAP writes "According to dmr's history of the C language NB had a machine code generator and ken told me (by email) that dmr's work on the code generator started on the Honeywell mainframe and that NB was always in machine code.” - http://squoze.net/NB/README).
>>
>> - DMR also writes in that paper: "By 1971, our miniature computer center was beginning to have users. We all wanted to create interesting software more easily. Using assembler was dreary enough that B, despite its performance problems, had been supplemented by a small library of useful service routines and was being used for more and more new programs. Among the more notable results of this period was Steve Johnson’s first version of the yacc parser-generator” So, Yacc first appears in 1971 and is written in B. As such, it ran on both the PDP-11 and the GE/Honeywell.
>>
>> - It is a guess, but I would hypothesize that the c0/c1 structure of the early 1972/1973 C compilers goes all the way back to the GE/Honeywell implementation of B. In this respect it is suggestive that the “last1120” C compiler names its passes "nc0" and “nc1”, following shortly on the transitional “new B” / “nb”. If true, it would stand to reason to assume that this mainframe B compiler also used a similar recursive descent / operator precedence parsing scheme.
>>
>> - The DMR history paper then goes on to say that Johnson had a sabbatical at the University of Waterloo in 1972, but I think this might be a slip of the pen. A Uni of Waterloo retrospective says that he arrived late in 1972 (“In August 1972, […] a new arrival was causing a stir in the Math & Computer building at University of Waterloo – a brand new Honeywell 6050 mainframe size computer. […] Shortly after the arrival of the Honeywell, Steve Johnson came to the Math Faculty on sabbatical from Bell Labs.”). He brought B and Yacc with him ("I suspect that few people realize his key role in introducing Bell Labs culture to University of Waterloo so early, including B Programming Language, getchar(), putchar(), the beginnings of the notion of software portability and, of course, yacc.”). https://randalljhoward.com/tag/dead-whale/ The year 1973 is also supported by a resume from 1982 ("I spent a 9-month Sabbatical in 1973 at the University of Waterloo, where I taught courses in Advanced Applications Techniques and Algebraic Manipulation.” — https://stacks.stanford.edu/file/druid:ws821cy1376/ws821cy1376.pdf). 1973 is also a better match with the internship of Alan Snyder in that year.
>>
>> - In an interview Johnson mentions "When YACC first ran, it was very slow […] I set out to improve the size and space characteristics. Over the next several years, I rewrote the program over a dozen times, speeding it up by a factor of 10,000 or so. Many of my speedups involved proving theorems that we could cut this or that corner and still have a valid parser. The introduction of precedence was one example of this.” (https://www.computerworld.com/article/1570304/yacc-unix-and-advice-from-bell-labs-alumni-stephen-johnson.html). I suspect that a fair bit of this improvement happened in 1972, because he continues with "Dennis was actively working on B while I was writing YACC. One day, I came in and YACC would not compile – it was out of space. It turns out that I had been using every single slot in the symbol table. The night before, Dennis had added the ‘for’ statement to B, and the word ‘for’ took a slot, so YACC no longer fit!”. This suggests 1972 much more than 1974 as the timeframe he had in mind when saying this.
>>
>> - This also tallies with DMR’s account, writing: "When Steve Johnson visited the University of Waterloo on sabbatical in 1972, he brought B with him. It became popular on the Honeywell machines there, and later spawned Eh and Zed (the Canadian answers to ‘what follows B?’). When Johnson returned to Bell Labs in 1973, he was disconcerted to find that the language whose seeds he brought to Canada had evolved back home; even his own yacc program had been rewritten in C, by Alan Snyder.”. As explained above, I think this should be read as “late 1972” and “late 1973”. So: a first, early C version of Yacc can be placed at mid 1973.
>>
>> - Alan Snyder did the Honeywell version of his portable compiler in 1973 (the PDP-10 version and his thesis are from 1975) (https://retrocomputingforum.com/t/some-materials-on-early-c-and-the-history-of-c/3016/2). This compiler used yacc, which implies that by (late) 1973 yacc must have been stable, fast and compact enough to handle a sizable grammar. I can understand converting it to nascent C, as I have recently found yacc to be a great compiler test input. In the timeline, this Snyder version is close to the binary on the 4th edition tape.
>>
>> - B evolved at Waterloo. Report CS-75-23 from September 1975 says "Current efforts center on the language ‘B' which is already implemented on the HIS 6050 and PDP 11; we hope to have a version of B for the Microdata before January, 1976. […] The problem is now reduced to that of recoding the B compiler code generation section and the basic I/O primitives.” And report CS-75-29 from November of that year says "The B compiler is well suited for our preliminary experimentation with portability because it is nicely structured and therefore easily modified to generate code for other machines. This is largely due to the fact that it is a syntax directed compiler for a language which has a simple and compact syntax. The one-pass compiler is implemented in B.” I assume “syntax directed” in this context to mean that the Honeywell B compiler was recoded to use Yacc for its parser - - somewhere between 1973 and 1975. If so, that effort probably used the B version of Yacc that Johnson brought in 1973. The 1976 Eh and the 1978 Zed compilers for sure use Yacc to build their parser.
>>
>> - All this makes the Yacc binary on the 4th edition tape interesting to me, as it gives a window on the state of Yacc late in 1973 when Johnson returned to Bell Labs. The binary appears truncated at the 16kb mark, but a first quick look at the strings suggests it is quite similar to the source code that is included with the surviving 6th edition Yacc source code. Similar, but not fully identical. This is in a context where the surviving 1975 Yacc source looks decidedly 1973 in style. For instance the yyparse function in file “parser.c” looks like a B function that has been minimally edited to make it early C - https://www.tuhs.org/cgi-bin/utree.pl?file=V6/usr/source/yacc/lib/parser.c Another example is in y2.c, where in function “setup()” the “foutput" variable is set to -2 by default; I believe this to be a remnant from B on the Honeywell, where that means to output to the batch console.
>>
>> I wonder how much this 1975 yacc source has diverted from the 1973 Snyder B => C port; I presume not much. In fact, this version of Yacc proved quite easy to revert back to B (or Eh, actually): https://gitlab.com/pnru/Thoth/-/tree/master/user/yacc
>>
>> - An interesting sidetrack is the evolution of Johnson’s Yacc manual / paper. Several versions appear to exist, all a bit different. Later versions (1979 ?) have the “=“ sign before actions as an deprecated feature, but the 1975 source code still insists on the ‘=‘ sign. AAP appears to have the oldest version (1975?) of the document and this version still has the equals sign as mandatory in its text (http://squoze.net/UNIX/v6/files/doc/yacc.pdf). In 7th edition manual and code base the use of this ‘=‘ has become optional. I wonder when and why this change was made, the old syntax seems harmless.
>>
>> - The 6th edition Yacc has an optional optimizer pass, "/usr/yacc/yopti” which was optionally run after Yacc completed. As far as I can tell, the source for this optimiser is lost. I have found no materials explaining this optimizer pass.
>>
>> - Between 6th edition and 7th edition the code base changes substantially, presumably further compaction and speed-up. It grows from ~1700 to ~2200 lines. The optimizer pass is integrated into the base package, support for Ratfor is dropped, etc. The source also starts to look like ‘real C’. Alternatively, the yacc source in 6th edition might not reflect the latest internal Bell version and actual yacc development was perhaps more gradual. Although I use the 7th edition version in my C recreation of the Eh compiler, it does not seem like it is a good base to approximate what the Uni of Waterloo might have used in 1975-1977.
>>
>> Now for the questions:
>>
>> - Do the above timeline and assumptions sound correct (or at least plausible) to those who were there?
>>
>> - Does anybody know of Yacc source code older that what is included in 6th edition (other than attempting to reverse engineer the recently recovered 4th edition binary)?
>>
>> - Does anybody know more about the missing Yacc optimizer in 6th edition, what it did, etc.? Or is the only way to compare and contrast with 7th edition where the (that?) optimizer is integrated?
>>
>>
>>
>>
>
>
More information about the TUHS
mailing list