[TUHS] On the uniqueness of DMR's C compiler

Fri May 10 06:40:28 AEST 2024

Thanks everybody for the feedback and pointers, much appreciated!

The main point is clear: the premise that the DMR C compiler had unique (native, small machine) code generation during most of the 70’s does not hold up.

Clean Cole is correct in observing that (certainly for the 70’s) I’m skewed to stuff from academia with a blind spot for the commercial compilers of that era.

Doug McIlroy’s remarks on Digitek were most helpful and I’ll expand a bit on that below.

I was aware of the Digitek / Ryan-Macfarland compilers before, but in my mind they compiled to a virtual machine (mis-understanding a description of “programmed operators” and because their compilers for microcomputers did so in the 80’s). Digging into this more led me to a 1970 report "Programming Languages and their Compilers, Preliminary Notes” by John Cocke and J.T. Schwartz:
https://www.softwarepreservation.org/projects/FORTRAN/paper/Bright-FORTRANComesToWestinghouseBettis-1971.pdf

It is a nearly 800 page review of then current languages and compilers and it includes some discussion of the Digitek compilers as the state of the art for small machines and has some further description of how they worked (pp. 233-237, 749). It also mentions their PL/1 for Multics fiasco (for background https://www.multicians.org/pl1.html).

- The Digitek compilers were indeed small enough to run on PDP-11 class machines and even smaller, and they produced quite reasonable native code. In this sense, they were in the same spot as the DMR C compiler which was hence not unique in this regard -- as Doug points out.

- They consisted of two parts: a front end coded in “Programmed Operators" (POPS) generating an intermediate language, and a custom coded back-end that converted the IL to native code.

- POPS were in effect a VM for compiler construction (although expressed as assembler operations). To move a compiler to a new machine only the POPS VM had to be recoded, which was a very manageable job. From the description in the above book it sounds very similar to the META 3 compiler generator setup, but expressed in a different form.

- Unfortunately, I have not been able to find a description of the POPS IL.

- The smaller Digitek compilers had a limited level of optimisations, carried out at the code generation phase. The optimisations described sound quite similar to what the DMR C compiler did in its c1 phase (special casing +1 and -1, combining constants, mul/div to shift, etc.)

- Code generation seems to have been through code snippets for each IL operation, selecting from one of 3 addressing modes: register, memory and indexed; the text isn’t quite clear. It sounds reasonable for small machines in the 60’s.

- The later Ryan-MacFarland microcomputer compilers seem to have used the same POPS based front-end technology, but using an interpreter to execute the IL directly.

Interestingly, the above book has a final chapter about “the self-compiling compiler”. To quote: “The scheme to be described is one which has often been considered, and in some cases even implemented. It involves the use of a compiler written in its own language, and capable therefore of compiling itself over to a new machine.” It proceeds to describe such a compiler in quite some detail, including using a table driven code generator.

Seen through this lens, the DMR C compiler could be viewed as a re-imagining of the Digitek small system compilers using a self-compiling lexer/parser instead of POPS (or TMG or META) and a (also self-compiling) code generator evolved to handle the richer PDP-11 addressing modes. The concept seems to have been in the air at that time.

Now I am left wondering why the IL-to-native back-ends were not more used in academic small machine compilers in the 70’s -- but this too may be the result of a skewed view on my part.