[TUHS] Compilation "vs" byte-code interpretation, was Re: Looking back to 1981 - what pascal was popular on what unix?

Phil Budne phil at ultimate.com
Mon Jan 31 11:41:11 AEST 2022


> Is there consistency here?

There's a wide spectrum of strategies used for implementation of
languages, and no perfect and universally agreed on taxonomy.

	(And in networking, where there is an "International Standard"
	taxonomy, both the original ARPAnet, and the modern Internet
	don't fit into the (ISO) model!)

At ends of the spectrum you might get people to agree on the "pure
interpreter", which intreprets source code DIRECTLY, and the "native
code compiler", which generates instructions for the instruction set
of a physical computer (typically the one the compiler is running on,
with the term "cross compiler" used when the target architecture is
different than the one the compiler is running on).

I don't doubt this has been brought up many times in the "comp.compilers"
group: https://compilers.iecc.com/

To bring the discussion back to "Unix Heritage":

The earliest Unix shells were pure interpreters
(and for all I know, most still are).

Some BASIC language systems have been pure interpreters, but it gets
murky fast; Some interpretive systems have converted source code to
tokens in memory, or even saved to disk.

Beyond pure interpreters, most interpreters perform some kind of
compilation into some alternate representation of the program often
starting (with and sometimes (LISP), ending) with a tree.  Often, the
tree is traversed to a prefix or postfix "polish" form, which might,
or might not be written out (as a byte code, or other intermediate
form).

The earliest Unix language systems (TMG and B) on both the PDP-7 and
PDP-11 are interesting in that they output "word code" that is
assembled by as, and loaded with ld to produce "regular" executable
files which contain interpreters.

The earliest (PDP-7) Unix compilers, TMG and B both generated code for
(stack-oriented, postfix) pseudo machines (which happened to have
opcode fields the same size and position as the PDP-7 itself).

Since PDP-11 pointers can be a full 16-bit word, PDP-11 TMG and B
generate a stream of 16-bit postfix code (with pointers to interpreter
and "native code" support routines).  TMG contains an interpreter
loop, but the B interpreter is "threaded code" using machine register
r3 for the interpreter program counter, and each interpreter opcode
routine ends with "jmp *(r3)+"

I haven't examined Sixth Edition "bas" (written in assembler) closely
enough to say what kind of internal representation (if any) it uses.
"bc" generates postfix "dc" code using a yacc parser, and "sno"
appears to recursively eval a tree.

Seventh Edition awk looks to recursively execute a tree generated by a
yacc parser.

Compilers on older/smaller systems were sometimes divided into
multiple passes and wrote intermediate representations to disk, and
such output _could_ have been interpreted.

Language processors which output source code for another language (on
heritage Unix; struct, ratfor, and cfront for early C++) are usually
called preprocessors.

So...  Interpreters and preprocessors may perform much the same work
as compilers in their front ends, may or may not be identified as
compilers.

Java (and UCSD Pascal?) have compilers (to virtual machine code)
and an interpreter (for the virtual machine code).

Clear as mud?


More information about the TUHS mailing list